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A simulator  study  was  conducted  to  improve  training  perfor- 
mance measurement  selection  methods,  apply  the  results  to  an  auto- 
mated flight  training  system  and  conduct  an  evaluation  of  resulting 
measurement  during  automated  training  of  four  instrument  flight 
maneuvers. 

Empirical  methods  were  used  to  select  from  an  analytically 
derived  set,  those  measures  which  had  the  ability  to  discriminate  : 
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between  early  and  later  training  performance.  The  multiple  dis- 
criminant model  emerged  as  the  best  technique,  but  the  algorithm 
for  its  use  was  highly  modified.  The  automated  trainer  was  then 
modified  to  operate  on  three  measurement  subsystems,  (1)  the  ori- 
ginal scoring  algorithm,  (2)  the  measures  and  weighting  coefficients 
based  on  multiple  discriminant  analysis  results,  and  (3)  the 
original  scoring  algorithm  using  measured  normative  data. 

Resulting  measurement  was  evaluated  by  automatically 
trained  three  matched  groups  of  five  civilian  pilots  each  with 
the  result  that  time-to-train  was  reduced  34-40%  for  pilots 
training  with  empirically  derived  measures  over  the  original 
scoring  algorithm.  It  was  recommended  that  data  collection  at 
an  operational  site  be  undertaken  to  verify  the  methods  and  to 
produce  information  that  might  lead  to  a measurement  specification 
for  future  devices.  Recommendations  concerning  the  design  of 
adaptive  logics  were  made. 
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SUMMARY 

The  development,  implementation  and  empirical  evaluation 
of  a man-vehicle  training  performance  measurement  method  is 
reported  herein.  Initial  work  by  Vreuls,  Obermayer,  Lauber 
and  Goldstein  (1973)  emphasized  the  development  cf  a descriptive 
structure  for  obtaining  measurement  in  a man-vehicle  training 
situation,  starting  with  analytical  specification  of  measures. 
Noticing  certain  deficiencies  in  measurement  produced  by  analy- 
tic methods  alone,  the  next  effort  (Vreuls,  Obermayer  and 
Goldstein,  1974)  centered  on  the  initial  exploration  of  empiri- 
cal measure  selection  techniques  to  be  used  in  conjunction 
with  the  descriptive  model. 

Phase  I of  the  present  effort  concentrated  on  further 
refinement  of  measure  selection  methods  and  application  of  those 
methods  to  empirically  derived  data  for  the  purpose  of  recommend- 
ing measures  for  use  in  automated  instrument  flight  simulator 
training.  The  criterion  for  selection  of  analytically  defined 
measurement  was  that  each  measure  had  to  be  able  to  discrimi- 
nate between  early  and  later  training.  Tests  of  significant 
changes  for  singular  measures,  correlations  between  measures, 
multiple  discriminant  analyses  and  canonical  correlation  analyses 
were  explored.  A modified  form  of  the  multiple  discriminant 
analysis  appeared  most  suitable  for  the  purpose.  Measures, 
weighting  coefficients,  and  measurement  start  and  stop  conditions 
resulted  from  analysis  of  data  obtained  from  the  training  of  12 
pilots  on  four  instrument  maneuvers. 

Phase  II  focused  on  the  insertion  of  Phase  I measurement 
results  into  the  automated  Instrument  Flight  Maneuvers  flight 
simulator  (TRADEC/IFM)  at  i>,AVTRAEQUIPCEN  in  real  time,  and  the 
development  of  a rationale  to  map  the  new  and  somewhat  different 
measure  sets  into  an  adaptive  logic,  or  task  scheduler  which  was 
designed  to  accept  slightly  different  information.  IFM  was 
modified  to  operate  with  three  measurement  subsystems, (1)  the 
original  scoring  algorithm,  (2)  the  recommended  measures  from 
Phase  I,  and  (3)  the  original  scoring  algorithm  modified  on  the 
basis  of  normative  data  obtained  in  Phase  I. 

Resulting  measurement  subsystems  were  evaluated  in  Phase 
III  by  automatically  training  three  matched  groups  of  five 
civilian  pilots  each.  The  time-to-train  each  group  to  the  same 
performance  criteria  was  reduced  34-40%  for  both  empirically 
derived  measure  groups  (2  and  3 above)  over  the  original, 
analytically  defined  measurement  algorithm.  The  discriminant 
measures  appeared  to  be  sensitive  to  piloting  technique  an!  pro- 
vide more  reliable  performance  feedback.  Also,  the  discriminant 
model  appeared  to  have  potential  for  growth  to  higher  efficiency 
levels  than  reported  because  of  its  ability  to  select  and  pro- 
perly weight  important  student  variables  along  with  system 
performance.  Potentially  serious' inefficiencies  with  linear, 
single  score  adaptive  logics  were  observed  and  discussed. 
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The  results  were  encouraging  enough  to  recommend  that  data 
collection  at  an  operational  training  site  be  undertaken  to 
verify  the  measure  selection  methods  within  the  context  of  a 
military  flight  training  environment,  and  to  produce  data  which 
might  lead  to  eventual  measurement  specification  for  future 
training  devices  for  the  class  of  aircraft  and  maneuvers  flown. 
Recommendations  were  made  also  for  improvement  of  adaptive  lo- 
gics similar  to  IPM  and  for  a relatively  inexpensive  study  that 
might  resolve  adaptive  logic  inefficiency  and  provide  valuable 
guidance  to  designers. 
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SECTION  I 

INTRODUCTION  AND  TECHNICAL  SUMMARY 

Measurement  produces  information  which  is  needed  for 
assessment  of  trainee  performance,  subsequent  control  of 
training  and  for  training  effectiveness  evaluation.  Improve- 
ments in  training  efficiency,  and  evaluation  of  training  methods 
are  quite  dependent  on  improved  measurement.  Any  device, 
system  or  process  which  is  to  control  or  evaluate  training 
will  be  only  as  effective  as  its  information  sources. 

In  order  to  measure  many  of  the  complex  dimensions  of 
man-machine  system  training  performance,  the  processing  of 
large  amounts  of  continuously  varying  information  is  required. 
Such  measurement  is  beyond  the  capability  of  manual  or  simple 
measurement  devices;  it  must  be  automated  in  order  to  pro- 
duce information  in  ';ime  for  effective  control  of  training. 

Automated  measurement  places  severe  demands  on  the  defini- 
tion of  (a)  fool-proof  algorithms  for  determining  the  conditions 
during  which  measurement  is  to  occur,  and  (b)  measure  sets 
which  produce  only  the  information  necessary  for  effective  use 
by  the  information  receiving  system.  Too  much  information 
can  overload  the  user;  not  enough  information  might  reduce 
user  effectiveness. 

Historically,  performance  measures  have  been  specified  by 
analyses  of  knowledges,  tasks,  mission  requirements  and  per- 
formance standards  drawn  from  experience  or  consensus  of 
experts.  Analytically  derived  measurement  is  likely  to  in- 
clude (a)  different  measures  of  the  same  or  closely  related 
behavior,  (b)  measures  which  may  prove  to  be  unimportant  and 
(c)  measurement  based  on  oversimplified  or  inaccurate  criteria. 
Although  measurement  development  must  start  with  a good  analy- 
sis, empirical  techniques  are  required  to  overcome  analytic 
difficulties  and  reduce  measurement  to  a small,  efficient  set. 
The  reduction  of  analytically  defined  measures  into  a set 
which  r h n be  shown  to  have  the  desired  properties  is  called 
the  nwrsure  selection  process  herein. 

Previous  work  has  established  and  tested  (a)  a descriptive 
structure,  or  model,  for  obtaining  measurement  in  man-machine 
training  and  (b)  measure  selection  methods  based  on  multivari- 
ate statistical  models  which  evaluate  the  total  set  of  mea- 
sures taken  together,  and  produce  valuable  weighting  coeffi- 
cients. This  work  led  to  the  present  three  phase  study  to  (a) 
refine  the  measure  selection  methods,  (b)  apply  the  results  to 
an  automated  flight  training  system  and  (c)  conduct  tests  and 
evaluations  of  the  resulting  measurement.  Since  this  report 
is  quite  lengthy,  a technical  summary  of  the  work  is  presented 
in  the  following  pages  of  this  section. 
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MEASURE  SELECTION  SUMMARY 

The  purpose  of  Phase  I was  to  improve  measure  selection 
methods  while  developing  measures  for  an  experimental  auto- 
mated Instrument  Flight  Maneuvers  (IFM)  training  system  located 
at  NAVTRAEQUIPCEN.  The  automated  system  was  modified  to  con- 
trol a measurement  study  {rather  than  automatically  train) . 

Data  were  collected  on  magretic  tape  while  12  low-time  pilots 
underwent  18,  one-hour  training  sessions  on  four  instrument 
flight  maneuvers. 

The  resulting  data  were  used  for  measure  selection  analy- 
ses at  the  conclusion  of  training.  Initially,  an  average  of 
16  performance  measures  were  produced  for  each  maneuver  and 
measure  segment.  Correlational  analyses  of  redundant  infor- 
mation reduced  the  average  number  of  measures  from  16  to  12. 

A multiple  discriminant  analysis  was  used  to  find  the  mea- 
sures and  weighting  coefficients  that  would  best  describe  the 
change  in  performance  from  early  to  later  training;  an  average 
of  six  measuxes  were  found  to  be  important  for  each  measure 
segment.  With  the  addition  of  specified  outer  loop  measures 
the  recommended  set  which  averaged  9 measures  could  be  weighted 
and  summed  into  a single  score,  the  discriminant  function. 

Canonical  correlation  analyses  were  explored  also  to 
uncover  predictive  relationships  between  measure  sets  early  and 
late  in  training.  They  produced  an  average  of  seven  measures 
per  maneuver.  They  also  produced  asymmetrical  predictive  and 
criterion  sets  that  were  difficult  to  interpret  and  relate 
to  the  multiple  discriminant  analysis  results . Since  the  multi- 
ple discriminant  analysis  can  be  interpreted  as  a form  of 
prediction,  and  the  results  were  difficult  to  bring  together, 
the  canonical  correlation  was  omitted  from  further  development. 

STATISTICAL  PROBLEM.  Due  to  experimental  design  restrictions, 
four  problems  of  a statistical  nature  arose  because  of  our 
desire  to  use  the  multiple  discriminant  analysis.  The  first 
problem  was  that  the  mathematics  of  multivariate  methods 
demand  that  there  be  more  independent  observations  in  each 
treatment  group  than  unique  variables.  Although  in  each  day 
there  were  144  observations  and  only  an  average  of  16  variables, 
the  observations  could  not  be  considered  "independent" 
because  only  12  subjects  were  observed  (12  times  each  day) . 

The  second  problem  came  from  the  underlying  assumption 
in  the  derivation  of  multiple  discriminant  analysis  that  the 
various  treatment  groups  be  independent.  Since  each  subject 
was  measured  in  all  of  the  treatment  groups,  the  experimental 
design  also  failed  the  requirement  of  independent  groups. 

A third  problem  arose  because  we  planned  to  use  the  weights 
derived  from  Phase  I data  in  a subsequent  application  with  a 
new  group  of  students.  The  reliability  of  weighting  coefficients 
from  application  to  application  has  been  questioned. 
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A fourth  and  final  problem  arose  because  the  weighting 
coefficients  for  maneuvers  flown  with  turbulence  could  not  be 
determined  with  accuracy  because  there  were  no  turbulence 
runs  early  in  training  to  be  paired  with  later  turbulence  runs 
for  the  discriminant  analysis. 

STATISTICAL  PROBLEM  SOLUTION.  A method  was  derived  from  the 
literature  to  remove  the  components  of  variance,  due  to  both 
repeated  observations  within  a group,  and  repeated  observations 
between  groups  (the  first  two  problems  above) . As  can  be 
seen  in  Section  II,  the  method  was  similar  to  those  used  in 
univariate  statistics  for  repeated  observations. 

It  was  discovered  also  in  the  literature  that  a technique 
for  improving  the  predictive  reliability  of  weighting  coeffi- 
cients had  been  developed  for  the  multiple  regression  analysis. 
The  method,  called  "Ridge  Regression,"  incrementally  adds  a 
small  bias  to  the  diagonal  of  the  intercorrelation  matrix 
prior  to  multiple  regression.  As  the  bias  is  added,  the  weight- 
ing coefficients  can  be  seen  to  asymptote  to  stable  values. 

The  third  problem,  weighting  coefficient  reliability,  was 
solved  by  adding  a small  bias  to  the  "within  groups"  matrix 
in  the  discriminant  analysis  (similar  to  a ridge  regression) . 

The  results  markedly  changed  the  extreme  values  of  certain 
coefficients  without  altering  the  power  of  the  discrimination. 

The  fourth  problem  was  alleviated  by  removing  turbulence 
from  the  syllabus  for  subsequent  implementation  and  evaluation 
phases. 

A small  mathematical  controversy  still  lingers  over  the 
solution  to  the  statistical  problems  of  thi'  study.  These 
arguments  are  being  researched  and  describeu  in  a separate 
study  entitled,  "Statistical  Issues."  These  statistical  issues 
were  considered  more  or  less  fire  tuning  in  their  relation  to 
the  overall  measurement  system  and  not  to  have  a large  impact 
on  the  concept  of  the  discriminant  measurement  system. 

MEASUREMENT  IMPLEMENTATION  SUMMARY 

The  purpose  of  Phase  II  was  the  implementation  of  measures, 
weighting  coefficients  and  conditional  expressions  to  start 
and  stop  measurement  (from  Phase  I)  in  the  IFM  system  so  that 
it  could  train  in  the  automated  mode  with  three  measurement 
subsystems  in  the  Phase  III  evaluation.  The  three  subsystems 
to  be  used  were  (1)  original  IFM  scoring,  (2)  scoring  based 
on  discriminant  analysis  results  and  (3)  scoring  based  on  the 
original  IFM  measures,  corrected  for  measured  performance  norms. 

One  major  technical  challenge  was  to  make  the  new  measure- 
ment system  operate  in  real  time.  The  basic  flight  program 
required  solutions  of  the  aerodynamic  equations  every  50- 
milliseconds,  and  it  took  about  35-milliseconds  to  process  the 
equations  themselves.  Tha«t  left  about  15-milliseconds  to  per- 
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form  all  of  the  existing  IFM  functions  (which  make  the  flight 
system  an  automated  trainer)  and  to  process  the  new  measurement 
and  measure  start  and  stop  functions,  which  were  more  complex 
(than  the  original) . Modular  and  somewhat  heirarchical  soft- 
ware design,  elimination  of  "nice-to-have"  but  unnecessary 
real-time  performance  plots,  and  rearrangement  of  background 
and  foreground  processing  functions  provided  solution  to  the 
processing  problem. 

Another  major  tehcnical  challenge  was  to  develop  a 
method  to  scale  discriminant  measurement  from  Phase  I in  a 
way  that  no  substantial  changes  in  the  existing  adaptive  logic 
would  be  required.  The  newly  developed  measurement  was  quite 
different  in  dynamic  range  and  statistical  properties  than 
the  original  IFM  measurement.  Since  Phase  III  tests  were 
planned  to  evaluate  measurement  system  differences,  any 
adaptive  logic  change  required  by  the  different  measurement 
systems  could  confound  the  evaluation,  and  was  undesirable. 

Analyses  of  the  original  IFM  design  rationale  provided  a 
solution.  The  original  IFM  measurement  and  adaptive  logic 
design  philosophy  was  based  on  assumptions  of  performance  norms 
for  experienced  naval  aviators  which  were  derived  from 
NATOPS  standards.  The  score  which  represented  one  and  two 
standard  deviation  performance  could  be  expressed  from  these 
assumptions,  and  the  adaptive  logic  algorithm  was  built  on  that 
premise.  It  was  not  possible  to  relate  the  assumptions  of 
NATOPS  standards  to  the  more  complex,  discriminant  measurement. 

It  was  possible  to  empirically  define  cricerion  perfor- 
mance norms  from  measured  performance  data  with  the  discrimi- 
nant measurement,  and  to  relate  the  old  and  discriminant  mea- 
sure distributions.  When  this  was  done,  scoring  of  new  measure- 
ment on  each  trial  relative  to  the  criterion  performance 
distributions  (expressed  as  z-scores)  provided  a method  to 
equate  the  performance  evaluation  decisions  by  the  adaptive  lo- 
gic for  all  measurement  schemes. 

In  the  process  of  working  through  this  problem,  it 
was  noticed  that  the  actual  IFM  performance  score  distributions 
were  quite  different  than  the  assumed  norms  for  the  experienced 
naval  aviators.  One  obviously  simple  and  good  way  of  improving 
measurement  (of  this  type)  would  be  to  base  measurement  deci- 
sions on  actual  norms,  rather  than  assumed  norms.  A third 
measurement  subsystem  based  on  actual  IFM  norms  was  designed 
and  installed. 

System  engineering  tests  were  conducted  with  two  trainees 
with  the  result  that  real  time  measurement  was  achieved,  and  all 
measurement  subsystems  operated  properly  except  discrim.  The 
discriminant  model  measurement  occasionally  could  misclassify 
very  poor  performance  if  that  poor  performance  was  on  a nega- 
tively weighted  measure.  The  p rob lent  was  found  to  be  caused 
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by  poor  performance  exceeding  the  measurement  space  of  the  Phase 
I data.  (The  model  was  only  valid  within  the  measurement 
space  of  the  data  from  which  it  was  derived.)  The  problem  was 
solved  by  establishing  the  4-sigma  boundaries  for  negatively 
weighted  measures,  and  altering  the  real  time  discriminant 
scoring  subsystem  to  first  test  for  data  boundaries.  If  the 
boundary  was  exceeded,  the  score  was  set  to  2.7-sigma  (a  poor 
score) . If  the  boundary  was  not  exceeded  for  all  negatively 
weighted  measures,  the  discriminant  function  was  computed. 
Subsequent  tests  were  successful. 

MEASUREMENT  EVALUATION  SUMMARY 

The  resulting  three  measurement  subsystems  were  evaluated 
in  Phase  III  by  automatically  training  three  matched  groups 
of  five  civilian  pilots  each  on  the  TRADEC/IPM-modif ied.  Group 
I was  trained  using  old  IFM  scoring.  Group  II  was  trained  with 
discriminant  model  scoring.  Group  III  was  trained  with,  norma- 
tive IFM  scoring.  The  raw  results  of  time-to-train  to  the  same 
performance  criteria  revealed  that  the  discriminant  model  was 
far  superior  to  either  of  the  other  measurement  subsystems. 
However,  the  distributions  of  group  matching  variables  were 
found  to  be  unequal,  biasing  the  results. 

Removal  of  the  significant  sources  of  group  bias  (first 
score  in  the  simulator  and  age)  resulted  in  a 34-40%  improvement 
in  the  time-to-train  using  both  the  discriminant  model  and  the 
normative  IFM  model.  Examination  of  typical  trainee  plots  and 
a breakdown  of  the  number  of  trials  required  to  graduate  from 
each  maneuver  suggested  that  the  discriminant  model  provided 
more  reliable  performance  feedback  and  it  appeared  to  sense 
piloting  technique. 

The  discriminant  model  was  hypothesized  to  have  greater 
growth  potential  than  the  normative  IFM  model  because  of  the 
importance  of  student  variables  (such  as  age)  and  its  ability 
to  choose  and  properly  weight  significant  student  variables 
along  with  system  performance  variables  and  the  measures  of 
control  activity. 

The  evaluation  data  also  highlighted  some  potentially 
serious  inefficiencies  in  the  linear,  single  score  adaptive 
logic  design  as  it  interacted  with  the  syllabus  and  measurement 
system.  These  problems  are  discussed  in  Appendix  F,  and  a 
study  to  explore  more  efficient  logics  is  recommended. 

The  major  conclusion  and  recommendation  of  the  study  was 
that  the  discriminant  model  should  be  applied  to  the  problem  of 
specifying  measures  for  future  flight  training  systems.  In 
order  to  do  that,  • empirical  data  must  be  collected  at  an  opera- 
tional training  site  to  produce  data  for  measure  selection 
analyses.  Additionally,  the  results  of  the  measure  selection 
analysis  should  be  used  to  validate  the  effect  of  improved 
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measurement  on  training,  similar  to  the  methods  employed  in 
this  study. 

Remaining  conclusions  and  recommendations  can  ba  found 
in  summary  outline  form  in  Sections  VI  and  VII. 
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SECTION  II 

MEASURE  SELECTION  METHOD 


A combined  analytic  and  empirical  method  was  used  to  define 
measures  for  automated  training  of  four  instrument  flight 
maneuvers.  The  method  was  based  on  the  criteria  that  the  final 
measure  set  should  represent  a comprehensive,  yet  minimum  set  of 
measures  which  (a)  were  sensitive  to  the  skill  change  that 
occurred  during  training,  (b)  had  performance  prediction 
qualities,  and  (c)  which  tended  to  eliminate  redundant  forms  of 
information.  These  criteria  for  measurement  selection  and  the 
fundamental  techniques  and  algorithms  for  selecting  measures  were 
developed  and  elaborated  upon  in  earlier  work  (Vreuls,  Obermayer, 
Goldstein  and  Lauber,  1973;  Vreuls,  Obermayer  and  Goldstein, 

1974)  . 

MEASURE  SELECTION  PROCESS  SUMMARY 

The  measure  selection  process  contained  a series  of  related 
critical  steps  which  began  with  an  analysis  of  potential 
information  needs  for  training.  This  first  step  involved  the 
specification  of  performance  measure  candidates  (candidates  for 
empirical  selection  analyses)  which  in  the  judgment  of  the 
investigators  (armed  with  data  from  earlier  studies  and  sample 
analysis  data)  would  contain  information  of  importance  to  the 
( adaptive  logic  which  was  to  control  training. 

Next,  the  required  raw  data  parameters,  such  as  (but  not 
limited  to)  vehicular  state  variables  and  their  desired  sampling 
rates  were  defined.  Typically,  raw  data  parameters  were  not  in 
a form  that  was  useful  for  automated  measurement;  however, 
error  from  desired  values  and  transformations  such  as  the  average 
error  contained  the  desired  information.  Potentially  useful 
candidate  measures  were  defined  as  transforms  of  parameters. 

The  conditions  which  define  when  measurement  was  to  start 
and  stop  also  were  specified.  It  is  emphasized  that  the 
specification  of  unambiguous  rules  to  start  measuring  and  to  stop 
measuring  can  be  underestimated;  in  practice,  the  construction 
of  start/stop  algorithms  has  been  most  challenging,  and  is  a 
crucial  part  of  performance  measurement  specification. 

Having  defined  the  measures  and  rules  for  obtaining 
measurement,  the  next  step  in  the  process  required  collection 
of  empirical  data  during  training  to  provide  a battery  of 
candidate  measures  for  selection  analyses.  Computer  measure 
selection  analyses  based  on  multivariate  statistical  models  were 
used  to  reduce  the  measures  to  a final  set  according  to  each  of 
the  aforementioned  criteria.  The  outcome  of  the  analysis  was 
interpreted  by  the  investigators  and  merged  with  outer  loop 
/ measures  to  form  a final  recommended  set  for  each  maneuver. 

Further  computer  analyses  established  the  weighting  coefficients 
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to  use  with  each  measure  when  combining  the  measures  into  a 
composite  score  for  use  by  the  adaptive  logic. 

The  initial  measure  selection  process  included  a combination 
of  canonical  correlation  analysis  and  discriminant  analysis. 

This  method  of  selection  proved  more  complex  than  fruitful  and 
has  been  simplified  to  using  discriminant  analysis  only:  the 

canonical  portions  of  the  research  have  been  deleted  from  this 
report. 

APPARATUS 

The  test  equipment  was  the  Training  Device  Computer  System 
(TRADEC)  located  at  the  Naval  Training  Equipment  Center,  Orlando, 
Florida.  TRADEC  was  configured  as  a fixed-wing  aircraft  (F-4E) . 
TRADEC  hardware  included  an  XDS  Sigma- 7 computer  and  associated 
peripherals,  an  aircraft  cockpit  mounted  on  top  of  a four  degree- 
of-freedom  motion  platform  (pitch,  roll,  yaw  and  heave),  and  a 
host  of  related  equipment.  The  cockpit  contained  all  of  the 
controls  and  displays  found  in  a jet  fighter  front  seat,  except 
that  the  radio  navigation,  communications  and  weapons  systems 
were  mocked  up  and  non-f unctional . A digital  computer  program 
provided  the  basic  flight  simulation  (cf  Kapsis,  et  al,  1969; 
Erickson,  et  al,  1969) . 

The  basic  flight  program  was  converted  into  a computer- 
controlled  training  device  by  an  automated  instrument  flight 
maneuvers  ( IFM)  program  (cf  Charles,  Johnson  and  Swink,  1972) . 

IFM  automatically  sequenced  the  trainee  through  a series  of 
maneuvers  and  simulated  flight  conditions  ordered  from  least  to 
most  difficult,  as  a function  of  measured  trainee  performance  on 
the  previous  and  antecedent  trials.  The  performance  measures 
and  weighting  coefficients  for  summing  the  various  components  of 
error  into  one  composite  score  were  derived  during  IFM  system 
design  from  task  analytic  data.  The  measures  were  never 
formally  tested. 

As  a part  of  a previous  effort,  IFM  was  modified  to  control 
a measure  selection  experiment  and  to  produce  raw  data  on 
magnetic  tape  for  subsequent  (non-real  time)  conversion  into 
candidate  measures  to  be  used  for  measure  selection  analyses. 

A computer  controlled  speech  synthesizer  (COGNITRONICS)  was 
used  to  brief  participants  on  the  task  requirements  for  each 
trial,  and  issue  corrective  commentary  when  various  vehicle 
states  were  out  of  selected  tolerance  bands  based  on  NATOPS 
performance  criteria.  The  IFM  task  scheduler  was  used  to  set  the 
experimental  conditions  for  the  next  trial  as  prescribed  by  the 
experimental  design. 

PARTICIPANTS 

Twelve  relatively  low-time  student  and  private  pilots  were 
used  as  trainees.  They  averaged  55  hours  of  flight  time,  3.7 
hours  of  prior  instrument  time  and  had  a median  age  of  24  years. 
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All  participants  had  some  familiarity  with  instrument  flight  but 
were  unskilled.  All  participants  were  light  plane  pilots  and 
were  unfamiliar  with  jet  fighter  responses. 

TASK 


Each  participant  was  trained  to  fly  four  basic  instrument 
flight  maneuvers;  (a)  straight  and  level  flight,  (b)  standard 
rate  climbs  and  descents,  (c)  level  turns  and  (d)  climbing  and 
descending  turns.  Aircraft  weight  and  resultant  center-of- 
gravity  shift,  and  turbulence  were  varied. 

Straight  and  level  flight  required  the  trainees  to  hold  a 
heading  of  360  degrees,  altitude  of  25,000  feet — Plight  Level 
(FL)  250,  airspeed  of  350  knots  for  one-half  of  the  trials  and 
an  airspeed  of  280  for  the  remainder  of  the  trials.  Each  trial 
lasted  one  minute. 

Standard  rate  climbs  and  descents  required  the  trainees  to 
climb  from  FL  240  to  FL  250,  or  to  descend  from  FL  250  to  FL  240 
at  a standard  rate  of  1,000  feet-per-minute  while  holding  a 360 
degree  heading  and  350  knots  of  airspeed.  One-half  of  the  trials 
were  climbs?  the  other  half  were  descents. 

Level  turns  required  constant  bank  (30  degree)  turns  from  a 
heading  of  360  degrees  to  a heading  of  315  degrees  or  045 
degrees  while  holding  FL  250  and  350  knots.  One-half  of  the 
trials  were  left;  the  remainder  were  right. 

Climbing  and  descending  turns  required  a climb  or  descent 
for  1,000  feet  at  1,000  feet-per-minute  while  turning  through 
a 90  degree  heading  change  and  holding  airspeed  at  280  knots; 
the  initial  climb  or  descending  turn  was  followed  by  a reversal 
of  turn  direction  and  altitude  rate,  and  subsequent  return  to 
the  starting  heading  and  altitude.  One-half  of  the  trials  were 
left,  descending  turns  starting  at  FL  250,  followed  by  a right 
climbing  turn  back  to  FL  250  and  heading  360  degrees.  The 
remaining  trials  were  right,  climbing  turns  starting  at  FL  240, 
followed  by  descending,  left  turns  back  to  FL  240  and  heading  of 
360  degrees. 

Two  task  stressors  were  used,  turbulent  air  and  aircraft 
weight  and  center  of  gravity.  The  turbulent  air  was  produced 
in  the  flight  program  by  a random  number  generator.  When  used, 
its  intensity  was  set  to  a "light  turbulence"  level  as  defined 
by  the  IFM  program.  The  aircraft  weight  was  either  light  or 
heavy.  The  light  aircraft  carried  2500  pounds  of  fuel,  had  a 
gross  weight  of  33,600  pounds  and  center-of-gravity  at  29.0 
percent  mean  aerodynamic  chord.  The  heavy  aircraft  carried 
12,896  pounds  of  fuel,  had  a gross  weight  of  43,996  pounds  and  a 
center-of-gravity  at  30.2  percent  mean  aerodynamic  chord.  The 
weight  increase  and  aft  center-of-gravity  shift  reduced  the 
longitudinal  axis  short-period  darning  coefficient,  which  de- 
creased the  simulator  pitch  axis  stability,  making  it  more  diffi- 
cult to  control.  Task  stressors  were  not  changed  during  a trial. 
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PROCEDURE 

Each  participant  was  given  a familiarization  flight  to 
learn  the  experimental  procedures  and  the  simulator.  No 
participant  started  the  experiment  until  they  had  the  simulator 
under  control , according  to  the  judgment  of  the  Test  Director, 
who  was  a Commercial  Pilot  with  Instrument,  Helicopter, 

Sailplane,  Multi-Engine,  Land  and  Seaplane  ratings. 

IFM  trimmed  the  simulator  for  straight  and  level  flight  at 
the  initial  heading,  altitude  and  airspeed  prior  to  the  beginning 
of  each  trial.  The  trainee  was  instructed  on  the  conditions  of 
the  run  by  the  COGNITRONICS  and  told  to  take  control.  In 
addition,  a card  diagramming  each  maneuver  for  each  trial  was 
placed  in  the  cockpit  for  reference.  The  trial  and  data 
collection  were  trainee  initiated  by  placing  the  speed  brake 
switch  forward.  Speed  brake  aerodynamic  effects  were  locked-out 
of  the  simulation  software. 

EXPERIMENTAL  DESIGN 

The  participants  were  trained  on  the  four  basic  instrument 
flight  maneuvers  for  18,  one-hour  sessions.  A total  period  of 
19  weeks  wc>s  required  to  collect  the  data.  Six  trials  of  each 
maneuver  were  flown  during  each  training  session.  Each 
successive  odd  and  even  numbered  training  session  was  pooled 
into  one  unit  called  a training  "day";  thus,  sessions  I and  2 
became  Day  1,  sessions  3 and  4 became  Day  2,  etc.  This  pooling 
resulted  in  144  possible  observations  for  each  maneuver  on  a 
given  trailing  da;,  (12  participants  by  6 trials  by  2 sessions). 
The  design  is  shcvn  in  table  1. 

Each  participant  received  exactly  the  same  order  of 
experimental  trials  on  each  day.  Thus,  maneuver  one  always  was 
flown  first  and  maneuver  four  always  was  flown  last.  This  fixed 
order  permitted  the  study  cf  measures  for  each  maneuver  under 
identical  antecedant  conditions  (and  subsequent  order  effects) 
across  training  days. 

On  Days  1,  3,  5,  and  7 the  trials  were  flown  with  a light 
aircraft  (forward  C.G.)  and  no  turbulence.  A heavy  aircraft 
(aft  O.G.)  was  presented  on  Days  2,  4,  and  6 without  turbulence. 
Light  turbulence  was  presented  on  Day  8 with  a light  aircraft. 

Day  9 consisted  of  a heavy  aircraft  and  light  turbulence. 

It  was  assumed  that  after  14,  one  hour  training  sessions 
(the  conclusion  of  Day  7),  the  trainees  would  be  relatively 
proficient  on  the  basic  maneuvers.  Therefore,  a comparison 
of  performance  differences  between  Day  1 and  Day  7 should 
reveal  measures  which  were  sensitive  to  the  skill  change  that 
occurred  and  those  measures  which  had  performance  prediction 
qualities  without  task  stressors  in  operation.  A similar 
comparison  of  Day  2 versus  Day  6 should  reveal  those  measures 
which  are  sensitive  to  training  when  flying  with  an  aft  C.G.; 
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TABLE  1 . EXPERIMENTAL  DESIGN 
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jPi 
Ml  P2 


PI 
M2  P2 


PI 
M3  ?2 


PI 

P2 

M4  * 

• 

P12 


Legeno : 


M**  Maneuvers : 


P»Participants 
G»Center  of  Gravity; 

T«Turbulence : 


Ml  ■»  Straight  and  Level 
M2  ■ Standard  Rate  Climbs  and 
Descents 
M3  » Level  Turns 
M4  « Climbing  and  Descending 
Turns 

G1  - Light  Aircraft,  Fore  eg. 
G2  « Heavy  Aircraft,  Aft  eg. 

T1  **  Smooth  Air 

T2  ■ Light  Turbulence 


DAY-Two  successive  one-hour  training  sessions. 

* Twelve  trials  were  administered  on  each 
maneuvers,  each  day. 
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the  measure  set  was  not  expected  to  be  exactly  the  same  as  with 
a forward  C.G. 

A comparison  of  Day  7 versus  Day  8 performance  was 
anticipated  to  reveal  the  differences  in  measure  set  composition 
caused  by  the  addition  of  turbulence  as  a task  stressor.  Day  7 
versus  Day  9 performance  would  provide  evidence  of  the  measure 
set  differences  caused  by  the  addition  of  both  aft  C.G.  and 
turbulence  as  task  stressors. 

The  development  of  this  experimental  design  presented 
challenges  which  required  compromise  between  theoretical  issues 
and  practical  constraints.  Our  biases  in  attacking  these 
compromises  were  poignantly  expressed  by  Cooley  and  Lohnes  (1973 , 
p.v.),  drawing  reference  to  Tukey  (1962): 

Tukey  argued  that  there  have  to  be  people  in  the  various 
sciences  who  concentrate  much  of  their  attention  on  methods 
of  analyzing  data  and  of  interpreting  the  results  of  statis- 
tical analysis . These  have  to  he  people  who  are  more 
interested  in  the  sciences  than  in  mathematics , who  are 
temperamentally  able  to  'seek  for  scope  and  usefulness 
rather  than  security,  ' and  who  are  ' willing  to  err 
moderately  often  in  order  that  inadequate  evidence  shall 
more  often  sugaes t the  right  answer.  ' They  have  to  use 
scientific  judgement  more  than  they  use  mathematical 
judgement,  but  nov  the  former  to  the  exclusion  of  the 
latter.  Especially  as  they  break  into  new  fields  of 
sciencing , they  must  be  more  interested  in  'indication 
procedures'  than  in  'conclusion  procedures'  (or  in 
conclusions  that  must  be  considered  statistically  weaker). 

It  was  recognized  that  there  was  a possibility  of  confound- 
ing the  effects  of  further  training  beyond  Day  7 with  the  effects 
of  task  stressors  in  the  experimental  design.  However,  the 
design  was  the  only  practical  one  because  of  the  length  of  time 
required  to  collect  data  (19  weeks) . A full  factorial  design 
with  all  conditions  presented  on  each  training  day  would  have 
reduced  the  number  of  observations  of  a given  condition  to  the 
extent  that  multivariate  measure  selection  techniques  would  not 
have  been  possible,  because  increasing  the  number  of  participants 
and  data  collection  time  was  not  possible  within  the  scope  of  the 
current  effort.  It  was  later  found  that  weighting  coefficients 
for  turbulence  tasks  could  not  be  accurately  determined  and 
turbulence  was  dropped  as  a task  stressor  in  the  final  task 
syllabus  for  evaluation  purposes. 

The  purpose  of  the  study  was  to  select  a minimum  number  of 
measures  from  a larger,  candidate  measure  battery.  Earlier  work 
suggested  that  multivariate  methods  offered  a good  avenue  for 
problem  solution.  Several  mathematical  issues  were  brought  about 
by  our  desire  to  explore  multivariate  models  as  a basis  for 
measure  selection  algorithms. 
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One  issue  was  the  number  of  observations,  or  samples 
required  in  each  experimental  group.  For  measure  selection 
purposes,  Lane  (1971)  concluded  that  f ive-to-seven  times  as 
many  participants  (samples)  as  initial  measures  in  a battery  are 
required  for  multiple  regression  analysis  to  adequately  address 
shrinkage  and  overfit.  Extrapolating  Lane’s  criterion  to  our 
current  problem  revealed  that  112  to  144  participants  would  be 
required  with  16  initial  candidate  measures  in  the  test  battery. 
Since  it  took  19  weeks  to  collect  data  from  12  participants,  it 
would  have  taken  228  weeks  to  collect  data  from  144  participants. 
Clearly,  this  was  not  possible. 

It  was  possible  to  form  144  observations  for  each  day  of 
training  by  pooling  data  from  12  repeated  trials  of  each  of  12 
participants.  The  consequences  of  pooling  data  in  this  way  to 
produce  a sufficient  number  of  scores  for  proper  operation  of 
the  multivariate  models  were  unclear  at  the  onset  of  the  study. 

A review  of  the  literature  and  informal  consultation  with 
several  statisticians  resulted  only  in  the  conclusion  that  the 
problem  was  a researchable  issue. 

Classical  multivariate  techniques  have  been  used  in 
personnel  selection  and  classification  for  years,  and  are  well 
developed  for  that  purpose.  Most  of  the  literature  addresses 
the  classification  problem,  which  typically  asks  questions 
about  the  probability  of  group  membership  of  an  individual  with 
certain  measured  traits.  These  classification  techniques 
require  familiar  assumptions  of  independent  sampling  of  various 
populations  to  achieve  assumed  multivariate  normal  distributions 
and  equality  of  dispersions. 

Our  research  problem,  however,  was  not  to  assess  the 
probability  of  group  membership,  but  to  find  a method  that  would 
display  measure  changes  for  given  individuals  as  a consequence 
of  their  training. 

The  best  tool  for  finding  measures  appeared  to  be  the 
multiple  discriminant  model  which  is  well  defined  in  the 
following  excerpt  from  Cooley  and  Lohnes  (1971,  p.  243): 

The  discriminant  model  may  be  interpreted  as  7 special 
type  of  factor  analysis  that  extracts  orthogonal  factors 
of  the  measurement  battery  for  the  specific  task  of 
displaying  and  capitalising  upon  differences  among  the 
criterion  groups.  The  model  derives  the  components  which 
best  separate  the  groups  of  a taxonomy  in  the  measurement 
space.  It  makes  rr  difference  to  the  formal  logic  of  the 
model  whether  the  samples  of  several  populations  are 
viewed  as  the  dependentt  criterion  variable  and  the 
discriminant  functions  are  viewed  as  the  best  prediction 
functions  of  the  independent , predictor  vector  variable 
defining  the  measurement  spacet  or  if  the  groups  are 
viewed  as  the  independent  treatment  variable  and  the 
discriminant  functions  are  seen  as  the  most  predictable 
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functions  of  the  dependant  vector  variable . 77ie 
taxonomic  variable  is  more  likely  to  be  the  criterion 
variable  in  survey  science , whereas  it  is  almost 
certain  to  be  the  independent  treatment  variable  in 
expen'mental  research. 

The  latter  definitions  of  variables  appeared  to  fit  the  current 
research  problem;  groups  r..ay  be  considered  to  be  the  independent 
variable.  The  different  treatment  groups  (or  days)  represent  a 
continuum  from  early  to  late  training.  Selection  of  specific 
comparisons  constituted  samples  from  the  continuum. 

The  experimental  design  shown  in  table  1 reveals  that  group 
membership  was  fixed  by  assignment  of  the  same  people  to  each 
group  (day)  . We  could  not  increase  the  nuir*  ar  of  participants 
to  form  two  independent  groups  because  data  collection  time 
would  have  doubled.  Neither  could  we  decrease  the  number  of 
maneuvers  (in  order  to  increase  the  number  of  participants 
within  the  same  data  collection  time  frame)  because  measurement 
information  was  needed  for  each  maneuver  and  each  task  stressor. 

Assignment  of  the  same  people  to  each  group  and  repeated 
observations  in  each  group  may  violate  the  assumption  of 
independent  sampling;  however,  these  violations  were  necessary, 
and  may  not  be  severe.  When  assumptions  are  obviously  violated. 
Winter  (1974)  indicated  that  the  linear  discrimination  model 
simply  becomes  an  empirical  procedure,  which  although  it  may  not 
be  optimum,  may  be  satisfactory  from  a practical  viewpoint. 

Also,  to  counteract  some  of  the  effects  of  this  violation  on 
the  data,  removal  of  these  components  of  variance  wes  done  before 
discriminant  analysis  was  performed.  There  can  be  no  doubt  that 
the  discriminant  model  will  find  and  highlight  the  measurement 
components  that  best  display  the  differences  between  groups,  as 
it  was  used  herein  as  part  of  an  empirical  procedure.  The 
procedure  should  be  validated  in  future  efforts. 

MEASUREMENT 

RAW  DATA.  Eighteen  pilot/system  performance  parameters  shown  in 
Appendix  A,  were  recorded  on  magnetic  tape  at  a rate  of  five 
times-per-second  in  real  time  from  the  beginning  to  the  end  of 
training.  The  raw  data  were  checked  and  packed  onto  16  reels  of 
2400  foot,  9 track  magnetic  tape  in  binary  format.  These  data 
were  processed  after  data  collection  was  complete.  Measures  were 
created  xrom  the  raw  data  by  computer  programs  designed  to 
execute  the  approach  to  measurement  which  had  been  previously 
developed  for  NAVx'RAEQUIPCEN  by  the  authors. 

MEASUREMENT  APPROACH.  A description  framework  has  been  estab- 
lished which  relates  system  performance  and  human  behavior  to 
segments  of  maneuvers  constituting  a training  mission.  This 
descriptive  structure  has  been  called  a measurement  model.  The 
model  permits  the  measurement  of  a variety  of  tasks  and  perform- 
ance dimensions  in  order  to  describe  unique  as  well  as  common 
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aspects  of  maneuvers.  To  accomplish  this,  the  model  defines 
each  measure  in  terms  of  the  following  six  determinants  (which 
are  summarized  in  the  paragraph  below) : (a)  A maneuver  segment; 

( b ) A parameter;  (c)  A sampling  rate;  (d)  A desired  value  if 
required;  (e)  A tolerance  value  if  required,  and  (f)  A trans- 
formation. 

A segment  is  any  portion  of  a maneuver  for  which  desired 
student  behavior  or  system  performance  follows  a lawful 
relationship  from  beginning  to  end,  and  for  which  the  beginning 
and  end  can  be  unambiguously  defined.  The  measurement  start  and 
stop  conditions  define  a segment.  A parameter  is  any  quantita- 
tive index  of  (a)  vehicle  states  in  any  reference  plane, 

(b)  personnel  physiological  states,  (c)  control  device  states, 
or  (d)  discrete  events.  A sampling  "ate  is  the  temporal 
frequency  at  which  the  parameter  is  examined.  Frequently 
parameters  have  no  utility  unless  compared  to  a desired  value  or 
a tolerance  to  derive  an  error  score.  Finally,  a transformation 
is  any  mathematical  treatment  of  the  parameter,  to  include 
measures  of  central  tendency,  variability,  scalar  values,  Fourier 
transforms,  pilot/system  transfer  functions,  etc. 

The  reader  is  urged  to  take  careful  note  of  the  definition 
of  a measure  used  throughout  this  report;  a measure  is  the  end 
result  of  the  measure  production  process,  which  starts  with  a 
raw  data  parameter  and  ends  with  a specific  transformation  of 
that  parameter. 

Current  measure  producing  computer  program  functions  for 
defining  measurement  s tart/s top  conditions  and  logically 
combining  s tart/s top  expressions  are  shown  in  Appendix  A. 

Common  measurement  transformations  available  in  the  measurement 
programs  are  shown  also  in  Appendix  A. 

CANDIDATE  MEASURES.  The  raw  data  were  processed  by  the 
measurement  software  to  produce  candidate  measures  for  measure 
selection  analyses.  Candidate  measures  for  each  maneuver  are 
shown  in  tables  2 - 6.  The  tables  indicate,  from  left  to  right, 
the  parameter  variable  names  in  the  simulation  software,  the 
desired  value (s) , the  transform  names  in  the  measurement  software 
and  the  measure  abbreviation-  used  throughout  the  report. 
Segmentation  rules  are  noted  for  each  maneuver. 

Maneuver  4 was  subdivided  into  three  segments,  numbered  2, 

J.  and  4.  Segment  2,  Initial  Climb  or  Descent,  started  at  the 
beginning  of  the  climbing  or  descending  turn  and  continued  until 
a change  in  altitude  had  exceeded  1,000  feet,  and  heading  had 
changed  from  the  initial  value  by  more  than  90  degrees.  Segment 
3,  Climb  or  Dive  and  Turn  Reversal,  started  at  the  end  of  Segment 
2,  and  continued  until  altitude  had  returned  within  1,000  feet  of 
the  initial  altitude.  Segment  4,  Final  Climb  or  Descent,  started 
when  altitude  was  within  1,000  feet  of  the  initial  altitude  and 
( heading  was  within  90  degrees  of  the  initial  value,  and  ended  at 

the  end  of  the  maneuver  as  defined  by  the  IFM  program. 
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*One-half  of  the  trials  were  at  350-knots  IAS,  the  other  half  at  280-knots 


TABLE  3.  CANDIDATE  MEASURES  FOR  MANEUVER  2,  CLIMBS  AND  DESCENTS 
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TABLE  5.  CANDIDATE  MEASURES  FOR  MANEUVER  4,  CLIMBING  AND  DESCENDING  TURNS 
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One-half  of  the  trials  started  with  right,  climbing  turns,  then  reversed  to  left 
descending  1 )rns.  The  desired  values  were  changed  appropriately  as  a function  of 
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MEASURE  SELECTION  ANALYSES 

( Measure  selection  analyses  were  performed  by  univariate 

and  multivariate  techniques.  The  results  were  interpreted  by 
the  investigators  and  merged  into  a composite  measure  set  for 
each  maneuver.  Final  discriminant  analyses  were  performed  to 
determine  the  relative  weights  of  the  recommended  measures. 

UNIVARIATE  SELECTION.  Considering  each  measure  independent  of 
all  other  measures,  the  average  value  of  each  measure  on  a 
given  day  was  compard  to  the  average  value  of  that  measure  on  the 
criterion  day.  A t-test  was  used  to  determine  statistically 
significant  differences.  The  means  on  Days  1,  3,  and  5 were 
tested  against  Day  7 for  performance  changes  during  training 
without  turbulence  and  with  a light  aircraft.  The  means  for  Days 
2 and  4 were  tested  against  Day  6 for  performance  changes  during 
training  with  a heavy  aircraft.  Day  8 means  were  tested  against 
Day  7 means  to  find  the  significant  changes  caused  by  the 
addition  of  light  turbulence.  Day  9 means  were  compared  to  Day  7 
means  to  determine  the  measure  set  changes  brought  about  by  the 
addition  of  both  light  turbulence  and  a heavy  aircraft. 

DISCRIM  SELECT.  Computer  programs  have  been  generated  to  select 
measures  through  multiple  discriminant  analyses  (cf  Cooley  and 
Lohnes,  1971) . These  analyses  assume  that  a battery  of  measures 
have  been  taken  for  each  of  a number  of  groups  of  participants. 
The  primary  purpose  of  DISCRIM  SELECT  is  to  isolate  the  measures 

( that  best  discriminate  between  groups.  For  example,  a pair  of 

groups  may  consist  of  experienced  and  inexperienced  participants; 
the  procedure  adopted  discards  measures  that  do  not  contribute  to 
such  discriminations  when  all  measures  are  considered  together 
as  a set. 

A data  editing  and  sorting  routine  was  added  to  the  initial 
part  of  DISCRIM  SELECT  in  order  to  facilitate  the  components 
of  variance  removal  programs.  (See  figure  1.)  The  components  of 
variance  programs  required  that  the  data  be  sorted  according  to 
subject,  trial,  day  and  maneuver  and  that  all  erroneous  data  be 
predetermined  so  that  matching  cells  can  be  formed  across  days 
used  in  the  analysis.  For  example,  if  the  data  for  Subject  1, 

Day  1,  and  Maneuver  3 were  erroneous  in  an  analysis  of  Day  1 
paired  with  Day  7,  neither  Day  1 or  Day  7 data  for  Subject  1 and 
Maneuver  3 would  be  present  in  the  analysis. 

Two  programs  were  designed  to  remove  from  the  data  the 
effects  of.  observing  the  same  subjects  in  all  conditions  and 
the  effects  of  observing  the  same  subject  twelve  times  in  each 
condition.  Both  programs  subtracted  the  components  of  variance 
from  each  data  point.  RMEAS  subtracted  the  effect  of  observing 
the  same  subject  on  both  days  (as  suggested  by  Schori,  1972) : 


( 
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1 K 

^ikL  * XmikL  “ ( XmikL 

where:  m - Variables 

i - Subjects 
k - Groups 

L - Observations/day 


, N K R 

NKR  tZt  tZt  ^ikL  ) 

L~1  K=1  L=1 

M - No.  of  variables 
N - No.  of  subjects 
K - No.  of  groups 
R - No.  of  observations/day 


REPM  subtracted  the  effect  of  observing  the  same  subject 
more  than  once  in  a day  using: 

^SnikL  * XmikL  ” ( R XmikL  ” nHr  ^ ^ikL  ) 


Both  of  these  operations  were  performed  before  any  other 
statistical  analysis. 


Many  measures  were  transforms  of  closely  related  parameters.  . 

Highly  correlated  measures  were  eliminated  in  order  to  reduce  ' J 

redundant  information,  and  to  avoid  computation  problems  which 
were  experienced  with  trial  data  when  intercorrelations  greater 
than  r=.95  existed  in  the  candidate  measure  sets.  The  criterion 
for  dropping  one  member  of  a highly  correlated  pair  was 
established  by  tests  with  r=.95,  r=.90  and  r=>.80;  r=.90  was 

selected  because  it  appeared  to  eliminate  obvious  redundancies, 
yet  left  a reasonable  number  of  measures  for  subsequent  analyses. 

Since  measure  transforms  were  ordered,  generally,  from  easiest  to 
most  difficult  to  compute  in  the  candidate  lists,  the  procedure 
was  adopted  to  drop  the  most  difficult  to  compute  an  in ter cor- 
related pair  for  a given  maneuver  and  analytic  comparison. 

DISCRIM  SELECT  iteratively  discarded  measures  until  a 
minimum  set  of  measures  resulted.  The  it  irative  process  stopped 
when  either  one  of  two  criteria  was  met,  (a)  the  total  nunber  of 
remaining  measures  was  less  than  the  minimum  number  of  Factors 
required  to  describe  the  variance  as  determined  by  a Principal 
Components  Analysis,  or  (b)  discarding  another  measure  would 
have  reduced  the  overall  discrimination  to  an  unacceptable  level. 

Two  tolerances  associated  with  the  above  criteria  had  to  be 
specified  by  the  investigators,  (a)  the  minimum  percent  variance 
to  be  accounted  for  by  any  Factor,  and  (b)  the  minimum  measure- 
ment communality.  Communality  was  the  amount  of  variance  a 
particular  measure  contributes  to  all  discriminant  functions.  ( ) 
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The  tolerances  were  set  by  trial  analyses  with  maneuver 
one  data.  It  was  found  that  between  90  and  95  percent  of  the 
original  variance  was  retained  when  the  minimum  variance  for  any 
Factor  was  set  at  7 percent;  this  tolerance  set  the  minimum 
possible  measure  set  size  to  equal  the  minimum  number  of 
"significant"  Factors.  Trial  analyses  also  revealed  that  in 
most  cases  measures  which  exhibited  communalities  less  than  .300 
were  non-siginif icant  contributors  to  the  discriminant  function, 
as  shown  by  the  Multivariate  Analysis  of  Variance  (included  in 
DISCRIM  SELECT  software).  Minimum  communality  was  set  at  .300. 

The  flow  diagram  for  DISCRIM  SELECT  is  shown  in  figure  1? 
each  block  is  described  in  the  following: 

1.  Read  tolerances  and  measure  tables.  The  level  of 
correlation  for  the  initial  removal  of  equivalent  measures, 
and  the  labels  for  each  measure,  were  read  from  punched  cards 
at  the  beginning  of  the  program.  The  two  additional  criteria 
were  read  for  DISCRIM  SELECT,  (a)  the  minimum  variance  and 
(b)  the  minimum  communality. 

2.  List  initial  measure  set.  The  initial  measure  set  was 
listed  by  number  and  name  of  each  measure. 

3.  Sort  all  data  in  the  selected  groups  according  to 
subject,  day,  maneuver  and  trial.  Match  cells  when  rejecting 
erroneous  data. 

4.  Perform  removal  of  components  of  variance  to  correct  for 
repeated  observations. 

5.  Combine  data  from  two  selected  groups.  Measures  from 
one  time  in  training  were  to  be  compared  to  the  same  measures 
taken  at  another  time  in  training.  The  measures  from  each 
training  day,  or  each  group,  were  brought  together  into  a common 
data  file  so  that  the  same  types  of  measures  could  be  compared 
observation  by  observation. 

6.  Generate  correlation  matrix.  Each  measure  was  cor- 
related with  every  other  measure  to  form  an  intercorrelation 
matrix. 


7.  Remove  highly  correlating  measures.  One  member  of  a 
pair  of  measures  was  removed  from  further  analysis  when  the 
correlation  coefficient  in  the  matrix  exceeded  0.90.  The  candi- 
date measures  were  ordered,  generally,,  from  least  to  most 
difficult  to  compute.  The  more  difficult  to  compute  transform  of 
a measure-pair  was  dropped.  No  measure  was  removed  for  reason  of 
high  correlation  if  the  high  correlation  coefficient  occurred 
between  two  measures  taken  at  different  points  in  training. 

8.  List  measures  kept  and  dropped.  The  measures  were 
again  listed  in  two  columns,  one  column  for  those  kept  for 
further  analysis,  and  the  other  column  those  which  were  removed 
from  the  analysis. 
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9.  Perform  principal  components  analysis  and  rotations, 
which  produced  the  following  outputs: 

a.  Factor  structure  for  each  group  (Day)  of  the 
comparison  provided  evidence  of  performance 
dimensions . 

b.  The  percent  variance  explained  by  each  Factor, 
degrees-of-freedom  and  CHI  SQUARE  aided  in  the 
assessment  significant  factors.  The  percent  of 
variance  explained  by  each  factor  was  used  (in 
Step  12)  to  establish  the  minimum  number  of 
measures. 

c.  VARIMAX  rotations  which  were  used  to  present  the 
principal  dimensions  of  performance  changes. 

10.  Perform  multivariate  analysis  of  variance  (MANOVA) , 
producing  the  following  output  for  use  by  the  discriminant 
analysis: 

a.  Means  and  standard  deviations  by  group. 

b.  A test  for  equality  of  dispersions. 

c.  Univariate  F-ratios  for  each  measure  used  to 
establish  reasonaole  grounds  for  the  commonality 
measure  rejection  criterion  (Step  11) . 

d.  Multivariate  test  of  significants,  Wilks'  Lambda 
and  F-ratios. 

11.  Perform  multiple  discriminant  analysis  (DISCRIM), 
producing  the  following  information: 

a.  Multivariate  test  of  significance,  Wilks'  Lambda 
and  F-ratios  (a  check  on  10.  d) . 

b.  CHI-SQUARE  with  successive  roots  removed  provided 
evidence  of  the  statistical  significance  of  the 
discriminant  function  (since  only  one  discriminant 
function  was  generated) . 

c.  Measure  coefficient  vectors,  the  weights  to  combine 
the  measures  into  the  discriminant  function. 

d.  Communalities , the  proportion  of  variance  (associa- 
ted with  each  measure)  extracted  by  all  discriminant 
functions;  as  noted  previously,  communality  was  the 
basis  of  removing  measures  from  the  set. 

e.  Group  centroids  in  discriminant  space  revealed  the 
group  mean  position  on  the  discriminant  function. 


35 


PUT 


NAVTRAEQUIPCEN  74-00063-1 

12.  if  the  number  of  measures  remaining  was  less  than  or 
equal  to  the  number  of  significant  factors , iterative  measure 
elimination  ceased  and  the  program  branched  to  Block  16.  If  the 
number  of  measures  was  greater  than  the  number  of  significant 
factors,  the  program  continued  to  the  next  test. 

13.  All  remaining  measure  communal  itie**  were  tested  against 
the  experimenter  specified  minimum  communality  (set  at  0.30  in 
this  study),  if  no  communalities  were  less  than  criterion,  th6 
program  terminated  through  Block  16.  If  there  were  remaining 
communalities  less  than  criterion,  iterative  measure  elimination 
continued. 

14.  The  measure  with  the  least  communality  was  found  and 
eliminated  from  the  set  and  correlation  matrix. 

15.  The  measures  kept  and  dropped  were  listed,  and  the 
analysis  was  recomputed  starting  at  Block  ^0. 

16.  The  final  measures  retained  and  those  dropped  in  order 
of  elimination  were  listed. 

17.  Perform  "ridge"  analysis  by  iteratively  adding  bias  to 
W matrix  and  reperforming  DISCRIM. 

18.  A final  principal  components  analysis  (and  rotations) 
was  performed  to  show  the  ending  factor  structure. 

The  resulting  set  was  examined  to  insure  that  all  vehicular 
outer  loops  which  represented  task  instructions  (such  as  hold 
heading,  airspeed  and  altitude)  were  represented.  If  outer  loop 
measures  were  dropped  during  iterative  analyses,  they  were  added 
back  into  the  recommended  set. 

Finally,  DISCRIM  SELECT  was  modified  to  perform  an  analysis 
on  only,  the  recommended  measure  set  in  order  to  assure  that  a 
significant  discriminant  function  was  retained,  and  to  compute 
the  weights  assigned  to  each  measure  of  the  final  set  for 
combining  data  into  a single  score,  the  discriminant  function, 
for  each  maneuver,  setment  and  day  comparison  group. 

In  order  to  explore  the  reliability/stability  of  the 
discriminant  model  DISCRIM  SELECT  was  also  modified  to  add  a bias 
(in  0.1  increments)  to  the  diagonal  of  the  W matrix  and  then 
reperform  DISCRIM  under  operator  control  after  the  recommended 
measure  set  was  determined.  This  is  referred  to  in  the  flow 
chart  and  was  similar  to  "Ridge"  regression  analysis.  (Hoerl  and 
Kennard,  1970.) 
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SECTION  III 

MEASURE  SELECTION  RESULTS  AND  DISCUSSION 


Candidate  measure  sets  were  created  differently  for  each  of 
the  instrument  flight  maneuvers  to  reflect  different  dimensions 
of  control  and  different  criteria  of  performance.  Results  were 
presented  for  each  maneuver.  Within  each  maneuver  there  were 
four  day-comparisons,  which  represented  changes  in  task 
complexity.  The  first  two  day-comparisons  sought  the  measures 
which  would  reveal  performance  changes  from  initial  to  final 
training  with  no  stress  — (a)  light  aircraft,  forward  C.G.  and 
no  turbulence  --  (Day  1 vs  Day  7,  and  (b)  with  heavy  aircraft, 
aft  C.G.  and  no  turbulence  — (Day  2 vs  Day  6) . The  third  and 
fourth  day-comparisons  sought  the  measure  set  changes  required 
by  the  addition  of  (a)  turbulence  only  (Day  7 vs  Day  8)  and  (b) 
turbulence  combined  with  a heavy  aircraft  and  aft  C.G.  (Day  7 vs 
Day  9) . 

Summary  data  are  presented  in  this  section  in  accordance 
with  four  steps  in  the  measure  selection  process,  (a)  means  and 
t- tests,  (b)  removal  of  equivalent  measures,  (c)  multiple 
discriminant  selection  analyses  (DISCRIM  SELECT),  and  (d)  the 
recommended  measures  and  weighting  coefficients  for  summing  the 
set  into  one  composite  score  for  each  maneuver. 

MEANS  AND  t- TESTS 

The  average  values  of  each  measure  for  every  maneuver  and 
segment  are  presented  in  Appendix  B for  each  training  day. 

Almost  all  of  the  measures  exhibited  a reduction  in  error  as  a 
function  of  training  day,  which  lent  face  validity  to  the 
training  sensitivity  of  the  initial  candidate  measure  set.  Each 
of  the  day-comparisons  were  tested  for  significant  differences 
by  t- tests.  Those  measures  which  were  significantly  different 
for  each  of  the  comparisons  were  selected  as  contributors  to  the 
training  sensitive  measure  set. 

Results  were  summarized  in  table  7.  Generally,  more 
measures  were  selected  for  less  complex  tasks  (Maneuvers  1 and  2) 
than  for  the  more  complex  Maneuvers  3 and  4.  Since  Maneuvers  1 
and  2 were  felt  to  be  less  demanding  than  Maneuvers  3 and  4, 
these  data  suggested  that  either  a sufficient  set  for  the  more 
complex  tasks  was  not  constructed,  or  there  were  more  redundant 
forms  of  measi remen t in  the  first  two  maneuvers. 

EQUIVALENT  MEASURES 

The  number  of  equivalent  measures  for  each  maneuver  and  day- 
comparisons  are  shown  in  table  8.  Measures  which  intercorrelated 
greater  than  r=.90  were  considered  to  be  equivalent  in  this  and 
subsequent  analyses,  and  therefore  could  be  substituted  for  one 
another.  It  was  noted  that  more  equivalent  measures  were  found 
in  Manuevers  1 and  2 than  in  the  remaining  maneuvers. 
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TABLE  7.  NUMBER  OF  MEASURES  SELECTED  BY  t-TESTS 


DAY 

1 

VS  DAY 

2 

vs  DAY 

7 VS 

DAY  7 VS 

ROW 

MANEUVER 

(NCM) 

1 DAY 

7 

DAY 

6 

DAY 

8 

DAY  9 

MEAN* 

1.  St.  & Level 

(23) 

21 

20 

13 

14 

17 

2.  Climbs/Dives 

(19) 

18 

17 

13 

13 

15 

3.  Level  Turns 

(17) 

13 

11 

9 

10 

11 

4-2.  Initial  CDT2 

(14) 

6 

6 

6 

6 

6 

4-3.  CDT  Reversal 

(11) 

3 

7 

5 

6 

5 

4-4.  Final  CDT 

(14) 

11 

8 

6 

9 

9 

Column  Mean* 

12 

. 12 

9 

10 

10 

NO 

AFT 

ADD 

ADD  AFT 

C.G. 

STRESS 

C.G. 

TURBULENCE 

& TURBULENCE 

TABLE  8.  NUMBER  OF  EQUIVALENT  MEASURES4 


DAY 

1 VS 

DAY 

2 

vs  DAY 

7 vs 

DAY  7 vs 

ROW 

MANEUVER 

(NCM) 

1 DAY 

7 

DAY 

6 

DAY 

8 

DAY  0 

MEAN2 

1.  St.  & Level 

(23) 

10 

8 

8 

8 

9 

2.  Climbs/Dives 

(19) 

7 

6 

7 

8 

7 

3.  Level  Turns 

(17) 

3 

2 

3 

4 

3 

4-2.  Initial  CDT3 

(14) 

5 

1 

1 

4 

3 

4-3.  CDT  Reversal 

(11) 

4 

3 

3 

3 

3 

4-4.  Final  CDT 

(14) 

2 

0 

1 

2 

1 

Column  Mean2 

5 

3 

4 

5 

4 

NO 

AFT 

ADD 

ADD  AFT 

C.G. 

STRESS 

C.G. 

TURBULENCE 

& TURBULENCE 

*NCM  = Number  of  Candidate  Measures 

2Means  were  rounded  to  the  nearest  whole  number 

3CDT  = Climbing  and  Diving  Turns 

4 A given  measure  may  be  equivalent  to  more  than  one  measure 


i 

) 
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The  composition  of  the  equivalent  measure  forms  is  shown  : 

in  Appendix  C.  Maneuvers  1 and  2 produced  large  chains  of  | 

equivalent  measures.  In  particular,  pitch  axis  control  range  • 

(ELRG)  resulting  angle-of-attack  and  pitch  attitude  (ALRG,  ALSD, 

PTRG  and  PTSD)  were  highly  correlated  throughout  Maneuver  1.  i 

Another  cluster  of  redundant  forms  appeared  for  altitude  and 
altitude  rate  (HRG,  HDAA  and  HDRG) . Roll  absolute  error  and  rms 
(ROAA  and  RORM)  were  equivalent  for  Maneuvers  1 and  2.  Aileron  j 

and  pedal  displacement  (AIF2  and  PDF2)  and  resulting  sideslip  j 

(BERM)  were  equivalent  during  Maneuver  2,  climbs  and  dives.  j 

i 

The  climbing  and  diving  turn  reversal  segment  was  quite  \ 

interesting  because  elevator  stick,  aileron  stick  and  pedal  i 

crossover  pouer  (ELFl,  AIF1  and  PDFl)  were  equivalent  to  each  j 

other  and  to  the  final  roll  attitude  value  achieved  (ROAF) . 

Aileron  and  pedal  displacement  (AIF2  and  PDF2)  were  equivalent  .] 

only  during  training  under  no  stress  conditions,  Day  1 vs  7.  i 

4 

The  equivalent  measures  analysis  served  as  a valuable  first 
step  filter  to  eliminate  unnecessary  measurement.  The  number  of  ; 

measures  remaining  after  removal  of  equivalent  forms  is  shown  j 

in  table  9.  The  following  multivariate  measure  selection  j 

analyses  received  a maximum  of  15  measures  to  operate  upon.  j 

Given  144  observations  for  each  measure,  the  worst  case  (15  j 

measures)  for  multivariate  analyses  produced  9.5  observations-  \ 

per-measure,  which  was  within  the  limits  set  by  Lane  (1971) , j 

making  the  assumption  that  observations  x subjects  were  3 

equivalent  to  subjects  alone  after  the  "components"  of  variance  ] 

were  removed . : 


TABLE  9.  NUMBER  OF  MEASURES  REMAINING  FOR  MULTIVARIATE  ANALYSES 


DAY 

1 vs 

DAY 

2 

VS  DAY  7 vs 

DAY  7 vs 

ROW 

MANEUVER 

(NCM) 

1 DAY 

7 

DAY 

6 

DAY  8 

DAY  9 

MEAN2 

1.  St.  & Level 

(23) 

15 

15 

15 

15 

15 

2.  Climbs/Dives 

(19) 

12 

13 

12 

11 

12 

3.  Level  Turns 

(17) 

13 

14 

13 

13 

13 

4-2.  Initial  CDT 3 

(14) 

9 

13 

13 

10 

11 

4-3.  CDT  Reversal 

(11) 

7 

8 

8 

7 

8 

4-4.  Final  CDT 

(14) 

12 

14 

13 

12 

13 

Column  Mean2 

11 

13 

12 

11 

12 

NO 

AFT 

ADD 

ADD  AFT 

C.G. 

STRESS 

C.G. 

TURBULENCE 

: & TURBULENCE 

JNCM  = Number  of  Candidate  Measures 
2Means  were  rounded  to  nearest  whole  number 
( 3CDT  = Climbing  and  Diving  Turns 
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DISCRIM  SELECT 

The  components  of  variance  removal  routines  increased  the 
symmetry  of  the  raw  data  while  preserving  the  group  centroids. 
With  the  improved  dispersions  of  the  data  and  the  effects  of 
repeated  observations  removed,  the  basic  assumptions  of  multi- 
variate discriminant  analysis  were  met.  The  relationship  of  the 
measures  between  days  then  could  be  determined  more  accurately. 

The  multiple  discriminant  analysis  iteratively  reduced  the 
measure  sets,  removing  measures  which  contributed  little  to  the 
discriminant  function.  Measures  with  low  communal! ties  (less 
than  .30)  were  dropped,  one  at  a time,  and  the  process  was 
repeated.  Iteration  continued  until  there  were  no  measures  left 
with  low  communalities,  or  the  number  of  measures  were  equal  to 
the  number  of  factors  which  accounted  for  more  than  seven  percent 
of  the  variance  in  the  initial  measure  set. 

After  elimination  of  redundant  measures,  DISCRIM  SELECT 
further  reduced  the  candidate  measures  to  an  overall  average  of 
six  measures  for  each  comparison-day  and  maneuver.  The  data 
in  table  10  illustrated  that  slightly  more  measures  were  required 
during  Maneuvers  1 and  2 than  during  the  remaining  comparisons. 
Only  three-to-f ive  measures  were  sufficient  to  describe  perform- 
ance changes  due  to  adding  turbulence  (Day  7 vs  8)  and  turbulence 
combined  with  a heavy  aircraft  (Day  7 vs  9)  as  task  stressors. 


TABLE  10.  NUMBER  OF  MEASURES  IN  EACH  MINIMUM  DISCRIMINATING  SET 


DAY 

1 vs 

DAY 

2 

VS  DAY  7 vs 

DAY  7 VS 

ROW 

MANEUVER 

DAY 

7 

DAY 

6 

DAY  8 

DAY  9 

MEAN1 

1.  St.  & Level 

12 

11 

3 

5 

8 

2.  Climbs/Dives 

12 

9 

5 

4 

8 

3.  Level  Turns 

9 

6 

5 

4 

6 

4-2.  Initial  CDT2 

6 

5 

5 

4 

5 

4-3.  CDT  Reversal 

5 

5 

4. 

5 

5 

4-4.  Final  CDT 

5 

4 

5 

4 

5 

Column  Mean1 

8 

7 

5 

4 

6 

NO 

AFT 

ADD 

ADD  AFT 

C.G. 

STRESS 

C.G. 

TURBULENCE 

& TURBULENCE 

'Means  were  rounded  to  nearest  whole  number 
2CDT  = Climbing  and  Diving  Turns 


V ) 


40 


NAVTRAEQUIPCEN  74-C-0063-1 


The  composition  of  the  minimum  discriminant  set  changed  as 
a function  of  training  (Lay  1 vs  7 and  Day  2 vs  6 taken  to- 
gether) and  as  a function  of  task  stressors  (Day  7 vs  8 and  Day 
7 vs  9)  taken  together) , as  illustrated  below: 


Measure  Type 

Training 

Stressors 

Overall 

Control  input 

26% 

56% 

38% 

System  Performance 

72% 

37% 

58% 

Elapsed  Time 

2% 

7% 

4% 

Control  input  (stick,  pedal  and  throttle)  measures  represented 
26  percent  of  the  minimum  measures  during  training  and  56 
percent  of  the  minimum  measures  which  describe  performance 
changes  due  to  task  stressor  changes. 


Although  it  was  important  information  that  turbulence  alone 
and  interacting  with  aft  C.G.  caused  a measure  set  change 
primarily  in  the  control  input  measures,  there  was  no  rational 
way  to  justify  the  use  of  the  resulting  discrimant  function 
for  control  of  automated  training  under  these  conditions.  Since 
turbulence  alone  and  with  aft  C.G.  were  not  measured  early  in 
training,  the  discriminant  function  could  not  be  sensitive  to 
the  skill  change  throughout  training  for  these  conditions. 

Therefore,  the  recommended  measures  and  weights  which  follow 
were  restricted  to  light  or  heavy  (aft  C.G.)  aircraft  conditions. 

RECOMMENDED  MEASURES  AND  WEIGHTS 

DISCRIM  SELECT  was  altered  to  perform  an  analysis  of  the 
recommended  measures  for  the  purpose  of  stabilizing  the  beta 
weights.  The  results  of  the  "ridge"  reanalysis  are  shown 
alongside  the  non-b:'  =<sed  results  in  tables  11  through  15  with 
the  bias  (k)  value  shown.  Values  from  0.1  to  0.5  were  found 
to  reduce  the  exaggerated  weights  as  much  as  70-80%  without 
affecting  the  canonical  R2  or  CHI-squared  significantly,  or 
altering  the  group  means  and  standard  deviations  in  discriminant 
space  significantly  (table  15) . 

The  resulting  weights  from  data  biasing  were  considered 
the  most  stable  model  for  use  in  the  automated  training  system. 
The  weights  can  be  used  directly  to  sum  the  measures  into  a 
single  score  for  use  by  the  adaptive  logic  (which  requires  a 
single  score  for  performance  assessment) . 
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TABLE  11. 


RECOMMENDED  MEASURES  AND  WEIGHTS  FOR 
MANEUVER  1,  STRAIGHT  AND  LEVEL 


Light  A/C  Heavy  A/C 


SEGMENT 

MEAS 

** 

o 

• 

o 

II 

X 

* 

ii 

O 

• 

* 

COMM2 

K=*0 . 0 

• 

o 

II 

* 

COMM 

Whole 

ELRG 

.997 

1.186 

.84 

.985 

.907 

.74 

Trial 

ELF1 

8.608 

1.024 

.25 

AIRG 

.374 

.502 

.76 

-.109 

-.011 

.46 

PDRG 

3.027 

2.451 

.35 

7.940 

5.671 

.40 

PTRG 

-.442 

-.456 

.80 

PTSD 

-1.440 

-.719 

.71 

ROAA 

.271 

.279 

.52 

PSRM 

.076 

.142 

.47 

.265 

.279 

.46 

PSRG 

-.023 

-.047 

.45 

-.152 

-.131 

.49 

HAA 

12.441 

1.193 

.64 

8.259 

2.131 

.46 

HRG 

-4.829 

-.321 

.71 

3.512 

2.342 

.62 

HDAA 

2.780 

2.108 

.82 

ASAA 

.012 

-.0001 

.48 

.058 

.109 

.53 

ASRG 

.049 

.050 

.63 

.045 

.040 

.57 

R2 

.731 

.688 

.455 

.412 

X2 

336 

298 

170 

149 

*K  is  the  bias 

added  to 

the  diagonal  of  the  "within"  matrix  in 

the  discriminant  analysis  to  stabilize  the  weights.  Where  two 
values  are  shown,  the  recommended  weights  are  below  the  highest 
k- value. 

2Comraunality  (the  amount  of  variance  of  each  measure  extracted  by 
all  discriminant  functions)  shown  is  associated  with  the  highest 
k- value. 


I 
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TABLE  12.  RECOMMENDED  MEASURES  AND  WEIGHTS  FOR 
MANEUVER  2.  CLIMBS  AND  DIVES 


SEGMENT 

MEAS 

K=0 . 0 

Light  A/C 
K«=0 . 4 

COMM 

Heavy  A/C 
K=0. 0 K=0 . 4 

COMM 

Whole 

ELF1 

10,640 

1.120 

.43 

Trial 

ELF  2 

-1.753 

.009 

.74 

2.528 

1.069 

.62 

ALRG 

-.512 

.078 

• 

CO 

o 

.167 

.182 

.67 

ALSD 

5.403 

1.871 

.91 

-.673 

.269 

.81 

PTSD 

3.281 

2.958 

.92 

2.041 

1.301 

.79 

HDAA 

-1.955 

-.453 

.76 

-.801 

-.332 

.49 

AIF2 

3.527 

.611 

.50 

ROAA 

-.004 

-.002 

.47 

.021 

.030 

.53 

PDF  2 

-.306 

.276 

.31 

PSAA 

.156 

.170 

.48 

.095 

.120 

.41 

TURM 

-.022 

.017 

.74 

.451 

.464 

.78 

ASAA 

.024 

.040 

.48 

.117 

.116 

.51 

R2 

.671 

.637 

.604 

.599 

285 

259 

261 

257 

See  footnote,  table  11 
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TABLE  13.  RECOMMENDED  MEASURES  AND  WEIGHTS  FOR 
MANEUVER  3,  LEVEL  TURNS 


SEGMENT 

MEAS 

K=0 

Light  A/C 
K= . 5 

COMM 

K=0 

Heavy  A/C 
K*.  5 

COMM 

Whole 

ELF  2 

-1.255 

.075 

.47 

ALRG 

-.449 

.188 

.73 

-.165 

-.126 

.35 

ALSO 

8.670 

4.362 

.82 

PTSD 

-.107 

.141 

.71 

1.560 

1.466 

.68 

AIF2 

-2.407 

-.556 

.41 

ROAA 

-.043 

-.052 

.06 

.147 

.230 

.31 

PDF  2 

5.319 

1.570 

.28 

14.465 

5.578 

.10 

ASAA 

.172 

.160 

.67 

.233 

.237 

.74 

HAA 

-3.802 

-2.359 

.40 

-4.202 

-2.047 

.53 

R* 

.586 

.528 

.426 

.365 

X2 

227 

193 

157 

128 

See  footnote,  table  11 
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TABLE  14.  RECOMMENDED  MEASURES  AND  WEIGHTS  FOR 
MANEUVER  4,  CLIMBING  AND  DIVING  TURNS 


SEGMENT 

MEAS 

O 

II 

K 

Light  A/C 
K=.3 

COMM 

K=0 

Heavy  A/C 
K= . 5 

COMM 

Initial 

ELF1 

-96.020 

-.341 

.20 

Climb 

ALRG 

.466 

.413 

.72 

.081 

.164 

.45 

or 

HDAA 

1.912 

1.584 

.76 

4.666 

4.251 

.78 

Dive 

THRG 

.031 

.020 

,45 

-.033 

-.021 

.25 

ASAA 

.158 

.175 

.13 

.209 

.237 

.53 

ROAA 

-.124 

-.148 

.03 

.035 

.045 

.10 

PDF  2 

6.919 

2.776 

.14 

R2 

.516 

.495 

.552 

.516 

x2 

189 

179 

227 

206 

Climb 

ELF  2 

-.183 

.07 

or 

ALRG 

.470 

.62 

.101 

.37 

Dive 

HDAF 

.721 

.36 

.779 

.18 

Reversal 

AIF2 

-1.497 

.24 

BERG 

-.357 

.18 

TIME 

.070 

.59 

.069 

.68 

PDF1 

-3.131 

.15 

R2 

.585 

.342 

. 

X2 

161 

93 

Final 

ALRG 

.472 

.72 

.085 

.47 

Climb 

HDAA 

.834 

.79 

1.611 

.83 

or 

ASAA 

.051 

.60 

.088 

.75 

Dive 

ROAA 

.075 

.26 

.065 

.28 

PSAF 

.019 

.27 

R2 

.629 

.274 

X2 

257 

91 

See  footnote,  table  11. 

K values  for  the  last  two  segments  did  not  change  weights 
materially. 
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TABLE  15.  MEANS  AND  STANDARD  DEVIATIONS  OF  DISTRIBUTIONS 
IN  DISCRIMINANT  SPACE*. 


LIGHT  A/C  HEAVY  A/C 


MANEUVER 

k=0 

k=0. 4 

k=0 

• 

o 

u 

1. 

MEAN2 

1.467 

1.49392 

2.004 

2.22700 

S.D. 

.775 

.54307 

.799 

.74903 

k=0 . 4 

k=0 . 4 

2. 

MEAN 

1.811 

1.82923 

2.550 

2.55531 

S.D. 

.642 

.58985 

.760 

.62989 

k=0 . 5 

k=»0 . 5 

3. 

MEAN 

1.481 

2.08675 

3.998 

3.21209 

S.D. 

.642 

.6*571 

1.331 

.78041 

k=0 . 3 

lc*0 . 5 

4-2. 

MEAN 

.621 

1.46996 

3.487 

3.23179 

S.D. 

.694 

.70715 

.976 

.67713 

o 

• 

o 

II 

k=0. 0 

.-3. 

MEAN 

1.666 

1.66642 

1.033 

1.03348 

S.D. 

.642 

.64270 

.809 

.80964 

k=0 . 0 

k=0 . 0 

4-4. 

MEAN 

2.488 

2.48889 

2,192 

2.19262 

S.D. 

.607 

.60770 

.850 

.85043 

See  f.. 

jtnote , 

table  11 

• 

2 Group 
days, 
equal 

mean  in  discriminant  space  shown  is  for  the  criterion 
Group  means  for  early  training  were  negative  and  of 
ma  ' tude  since  the  two  group  discriminant  analysis 

created  ymmetrical  coordinate  system  in  discriminant 
space. 
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DISCUSSION 

MEASURE  SET  COMPOSITION.  The  measure  selection  analysis  data 
maue  two  critical  points  which  have  an  enormous  impact  on  the 
design  of  performance  measurement  systems  for  automated  flight 
training. 

First,  control  input  measures  contained  a significant 
amount  of  information  about  training  and  the  effects  of  two 
task  stressors.  Typically,  control  input  measures  are  not 
found  in  many  training  device  measurement  systems.  Even 
advanced  systems,  such  as  the  Automated  Instrument  Flight 
Maneuvers  trainer,  do  not  evaluate  control  inputs,  primarily 
because  without  empirical  data,  such  as  those  contained  herein, 
it  has  been  difficult  to  assess  the  implication  of  control 
input  measures.  The  discriminant  analysis  removes  some  of 
these  difficulties  by  not  only  selecting  measures,  but  also 
assigning  weights  for  the  utilization  of  measures. 

The  second  critical  point  has  been  seen  in  every  measure- 
ment study  conducted  to  date  by  the  authors:  Different  measure 

sets  arc  required  when  the  task  changes,  even  with  the  simple 
addition  of  light  turbulence.  Measure  set  composition  changes 
alter  both  (a)  the  specific  measures  selected  for  each  task, 
and  (b)  the  weighting  coefficients  for  these  measures  if  the 
data  are  being  summed  into  a single  score. 

Measures  which  are  not  useful  for  one  condition,  but  which 
are  "carried  along"  to  cover  a second  condition,  might  degrade 
the  power  of  the  set  to  describe  the  first  condition.  Thus, 
one  must  be  cautious  in  the  application  of  measure  sets  to 
cover  a variety  of  task  situations.  To  guard  against  degrading 
the  power  of  measurement,  only  empirical  measurement  studies 
offer  an  avenue  to  assure  proper  measure  selection  and  com- 
patibility at  this  time. 

MEASURE  SEGMENT  START/STOP  LOGIC.  Existing  computer  programs 
were  used  to  produce  performance  measures  for  the  present  study. 
In  spite  of  their  broad  capacity  to  define  when  measurement  seg- 
ments start  and  stop,  considerable  testing  was  required  to 
derive  a set  of  logical  conditions  for  starting  maneuver  4, 
segment  4,  the  final  climbing  or  diving  turn. 

The  basic  problem  appeared  to  be  that  the  existing  logic 
tested  for  achieving  several  criterion  conditions  simultaneously. 
The  logic  did  not  permit  the  following  kind  of  desirable  ex- 
pression: (a)  Look  for  a 1000  foot  altitude  change.  Then, 

after  a has  been  found,  stop  looking  for  a and  look  instead  for 
(b)  altitude  to  return  to  within  1000  feet  of  the  initial 
value.  (c)  When  b becomes  true,  start  measuring.  If  evaluated 
simultaneously,  a and  b would  be  mutually  exclusive. 
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A new  type  of  logical  operator  appears  desirable  in  the 
start/stop  logic,  especially  for  maneuvering  flight.  Sequen- 
tial .AND.  (.SAND.)  is  proposed  to  join  together  any  two  logi- 
cal or  conditional  expressions  so  that  given  that  the  first 
becomes  true,  testing  of  the  first  stops  and  testing  of  the 
second  expression  starts. 

EQUIVALENT  MEASURES  ANALYSIS.  The  specification  of  initial 
candidate  measures  is  a direct  function  of  the  skill  of  the  ana- 
lyst. Two  kinds  of  measure  specification  errors  have  a high 
probability  of  occurance.  The  most  probable  error  appears  to 
be  overmeasurement.  In  the  face  of  uncertainty  caused  by 
sparse  evidence,  the  tendency  is  to  adopt  the  philosophy,  "If 
it  moves,  measure  it."  The  second  kind  of  error  is  to  omit 
an  important  information  form,  such  as  control  input. 

These  two  kinds  of  errors  represent  a dilemma  for  the 
measurement  analyst.  If  the  candidate  measure  sets  are  terse, 
the  risk  of  missing  important  information  is  nigh.  Yet,  if 
the  candidate  sets  are  abundant,  the  risk  and  cost  of  over- 
measurement can  be  so  enormous  that  data  collection  becomes 
impractical.  Even  if  data  collection  is  possible,  the  multi- 
variate procedures  for  measure  selection  require  seven  to  nine 
times  as  many  data  points  as  input  variables  to  work  properly; 
data  collection  requirements  are  c.  direct  function  of  the  num- 
ber of  measures  initially  specified. 

The  use  of  correlation  analysis  to  reduce  redundant  forms 
of  information  appears  to  be  a useful  tool  to  ease  the  dilemma. 
It  serves  as  a first  step  check  on  the  analyst.  Also,  it 
permits  the  analyst  a little  latitude  to  experiment  with 
candidate  measures  in  selected  areas  of  uncertainty.  However, 
heavy  dependence  on  the  equivalent  forms  analysis  to  eliminate 
redundant  measures  should  be  avoided. 

MINIMUM  STATISTICAL  SAMPLE.  The  discriminant  analysis  technique 
appears  to  have  been  effective,  but  considering  the  amount  of 
data  which  were  collected,  it  may  be  wondered  if  a smaller 
statistical  sample  could  have  sufficed. 

A relatively  small  number  of  participants  were  used  (12) . 
The  adequacy  of  this  sample  depends  on  the  population  to  which 
one  wishes  to  extrapolate.  On  the  other  hand,  a large  amount 
of  data  was  collected  from  these  participants  over  a quantity  of 
experimental  trials  (5184  TOTAL) . The  technique  used  in  this 
study  would  be  more  easily  applied  in  the  future  if  the  amount 
of  data  collection  could  be  reduced.  Consequently  it  is  appro- 
priate to  ask  if  sufficient  statistical  power  would  be  main- 
tained with  fewer  observations. 
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One  may  attempt  to  control  two  types  of  errors  in  the 
design  of  an  experiment:  (a)  the  error  of  asserting  that  a 

result  is  "real"  when  in  fact  it  occurred  by  chance,  and  (b) 
the  error  of  asserting  that  a result  occurred  by  chance  when 
in  fact  it  was  "real."  The  probability  of  the  first  type  of 
error  is  controlled  by  the  use  of  a table  of  statistics  cor- 
responding to  the  desired  probability  of  chance  occurrence. 
The  probability  of  the  second  type  of  error  is  controlled  by 
the  use  of  a sufficient  number  of  observations. 


The  necessary  sample  to  achieve  the  required  double 
conditions  can  be  determined  for  the  F-test  and  one-way 
comparisons  by:  (a)  specifying  the  minimum  practically  im- 

portant differences  one  wishes  to  detect,  (b)  determining  the 
experimental  error  which  will  be  encountered,  and  (c)  based  on 
a and  b,  reading  the  needed  sample  from  published  tables 
(cf.,  Winer,  1962,  pp.  657-658;  Scheffe,  1959,  pp.  438-455). 

Now  that  data  are  on  hand,  it  is  possible  to  conduct  such  an 
examination  with  minor  computer  program  modification  and  re- 
analysis. 

It  is  possible,  also,  to  modify  the  computer  programs 
and  repeat  the  analyses  with  the  amount  of  data  successively 
reduced  to  empirically  find  the  minimum  allowable  sample.  Such 
re-analysis  should  permit  future  applications  of  the  techniques 
of  this  study  with  increased  efficiency. 

"RIDGE"  WEIGHT  STABILIZING.  The  values  of  k selected  to  sta- 
bilize the  weighting  coefficients  were  higher  than  those 
typically  used  in  the  literature,  and  may  generate  controversy 
among  mathematical  statistitions . It  was  noted,  however,  that 
the  discriminating  power  of  the  measure  set  was  not  signifi- 
cantly changed  by  the  high  k values.  A partial  validation  of 
the  technique  resulted  when  the  recommended  measures  and  weights 
generalized  to  a new  subject  sample  in  Phase  III. 

AVERAGE  WEIGHTED  SCORES.  The  group  centroids  in  discriminant 
space  cannot  be  used  directly  to  establish  the  expected 
"average"  when  raw  data  are  weighted  and  summed.  This  is  be- 
cause the  discriminant  analysis  transforms  the  data  so  that  they 
are  symmetrical  in  discriminant  space.  The  data  base  must  be 
recalculated  using  recommended  measures  and  weights  to  establish 
the  actual  means  and  standard  deviations  on  the  untransformed 
discriminant  function  (as  can  be  seen  in  the  next  section  on 
measurement  implementation) . 
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SECTION  IV 

MEASUREMENT  IMPLEMENTATION 

Phase  II  was  a computer  software  re-design  and  implementa- 
tion effort  which  required  some  measurement  data  re-analysis 
and  the  development  of  scaling  methods  to  relate  new  measurements 
to  the  existing  adaptive  logic.  An  engineering  system  test  was 
conducted  to  insure  that  the  system  did  work  during  training 
conditions . 

APPARATUS 

The  experimental  flight  simulator  was  the  TRADEC,  locat'd 
at  the  Naval  Training  Equipment  Center.  TRADEC  was  converted 
into  an  automated  instrument  flight  trainer  as  described  in 
Section  II. 

TRAINING  COURSE 

The  original  IFM  training  course  consisted  of  65  different 
exercises  which  contained  18  straight  and  level  exercises,  20 
climbs  and  dives,  15  level  turns  and  12  climbing  and  diving 
turns  which  were  ordered  in  increasing  levels  of  "difficulty." 

The  inherent  complexity  of  the  maneuvers  was  one  factor  of 
difficulty.  Other  difficulty  factors  were  changes  in  aircraft 
weight  (center  of  gravity)  and  drag,  atmospheric  turbulence  and 
the  speed  at  which  the  aircraft  was  flown. 

An  analysis  of  the  original  IFM  training  course  suggested 
that  some  of  the  maneuver  task  combinations  were  not  necessary 
for  measurement  evaluation  purposes.  The  training  course  was 
shortened  to  44  different  exercises  which  contained  8 straight 
and  level  runs,  12  climbs  and  dives,  12  level  turns  and  12 
climbing  and  diving  turns  listed  in  table  16.  The  modified 
course  contained  the  fundamental  elements  of  the  original  task 
maneuvers  and  two  combinations  (each)  of  aircraft  center-of- 
gravity  and  airspeed. 

Aircraft  weight  and  center-of-gravity  shift  from  a more 
stable  longitudinal  axis  control  task  to  a less  stable  one  was 
obtained  by  manipulating  fuel  and  external  stores.  The  follow- 
ing conditions  are  shown  in  table  16: 

C.G.  Level  1 2400  pounds  internal  fuel, 

C.G.  Level  2 2 Sidewinder  missies  (stations  2,  8), 

full  internal  fuel  and  full  center-line 
tank . 

Two  airspeeds  were  used  as  shown  in  table  16.  The  slower 
speed,  280  knots,  was  more  difficult  to  fly  than  the  higher 
speed  because  of  aircraft  stability  differences. 
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TABLE  16.  MODIFIED  SYLLABUS 


MAN 

SEQ 

4 

IFM 

« 

BANK 

ANGLE 

IAS 

Kts 

TURN 

DEGREE 

CLIMB 

FEET/MIN 

CG 

STR 

01 

01 

0° 

350 

0 

0 

1 

& 

02 

01 

0° 

350 

0 

0 

1 

LVL 

03 

01 

0° 

350 

0 

0 

2 

04 

01 

°o 

350 

0 

0 

2 

05 

04 

°o 

280 

0 

0 

1 

06 

04 

0 

280 

0 

0 

1 

07 

04 

0° 

280 

0 

0 

2 

08 

04 

0° 

200 

0 

0 

2 

CLB 

21 

32 

0° 

350 

0 

-1000 

1 

(, 

22 

33 

0° 

350 

0 

1000 

1 

DIV 

23 

32 

°o 

350 

0 

-1000 

1 

24 

33 

°o 

350 

0 

1000 

2 

25 

32 

0° 

350 

0 

-1000 

2 

26 

33 

0° 

350 

0 

1000 

2 

27 

36 

°o 

350 

0 

-1000/+1000 

1 

28 

37 

0° 

350 

0 

1000/-1000 

1 

29 

36 

°o 

350 

0 

-1000/+1000 

1 

30 

37 

0° 

350 

0 

1000/-1000 

2 

31 

36 

0° 

350 

0 

-1000/+1000 

2 

32 

37 

0° 

350 

0 

1000/-1000 

2 

LVL 

51 

55 

30° 

350 

45 

0 

1 

TRN 

52 

56 

30° 

350 

-45 

0 

1 

53 

55 

30° 

350 

45 

0 

1 

54 

56 

30° 

350 

-45 

0 

2 

55 

55. 

30° 

350 

45 

0 

2 

56 

56 

30° 

350 

-45 

0 

2 

57 

58 

30° 

350 

90/-90 

0 

1 

58 

66 

30° 

350 

-90/+90 

0 

1 

59 

58 

30° 

350 

90/-90 

0 

1 

60 

66 

30° 

350 

-90/+90 

0 

2 

61 

58 

30° 

350 

90/-90 

0 

2 

62 

66 

30° 

350 

-90/+90 

0 

2 

CLB 

71 

71 

30° 

280 

90 

-1000 

1 

& 

72 

72 

30° 

280 

-90 

1000 

1 

DIV 

73 

73 

30° 

280 

-90 

-1000 

1 

TRN 

74 

74 

30° 

280 

90 

1000 

2 

75 

73 

30° 

280 

-90 

-1000 

2 

76 

74 

30° 

280 

90 

1000 

2 

77 

79 

30° 

280 

90/-90 

1000/-1000 

1 

78 

80 

30° 

280 

-90/+90 

-1000/+1000 

1 

79 

79 

30° 

280 

90/- 90 

1000/-1000 

1 

80 

8i 

30° 

280 

-90/+90 

-1000/+1000 

2 

81 

79 

30° 

280 

90/-90 

1000/-1000 

2 

82 

80 

30° 

280 

-90/+90 

-1000/+1000 

2 
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At  the  beginning  of  each  exercise,  the  computer  program 
would  set  the  aircraft  at  the  initial  run  conditions,  brief 
the  trainee  and  turn  simulator  control  over  to  the  trainee 
when  the  trainee  acknowledged  instructions.  All  runs  that  did 
not  involve  reversals  in  turn  directions  or  reversals  in  verti- 
cal path  were  nominally  one  minute  in  duration.  All  runs  that 
required  reversals  were  nominally  two  minutes  in  length,  al- 
though it  might  take  the  trainee  longer  than  the  nominal  time 
to  perform  them.  Limits  were  placed  in  the  program  to  stop 
exercises  if  reasonable  times  to  perform  were  exceeded.  Crash 
or  completely  out  of  control  conditions  also  stopped  the  exer- 
cise. 

ORIGINAL  IFM  PERFORMANCE  SCORING 

IFM  contained  a performance  measurement  module  and  an 
adaptive  logic  that  permitted  the  student  to  sequence  through 
the  training  course  according  to  his/her  measured  performance. 
Since  the  adaptive  logic  was  based  on  measurement  assumptions, 
it  is  important  to  the  present  effort  to  review  the  rationale 
behind  the  original  scoring  algorithm. 

The  performance  measurement  parameters  were  developed 
from  NATOPS  standards  for  instrument  flight  in  accordance  with 
the  performance  band  limits  shown  in  table  17.  It  was  assumed 
that  the  NATOPS  middle  bandwidth  represented  a 95%  probability 
(+2  standard  deviations) , and  that  this  level  of  performance 
denotes  acceptable  performance  for  an  experienced  naval  aviator 
(Charles,  et  al,  1972).  Thus,  95%  of  all  performance  by  exper- 
ienced aviators  would  fall  within  the  middle  bandwidth.  It 
was  further  assumed  that  any  error  data  about  the  nominal 
values  were  normally  distributed,  and  that  the  inner  bandwidth 
represented  one  standard  deviation  (about  68%  of  performance) 
and  the  outer  bandwidth  represented  four  standard  deviations 
100%  of  performance) . 

Error  from  the  desired  values  of  three  parameters  were 
obtained  during  execution  of  each  maneuver,  as  shown  in  table 
18.  Parameters  were  sampled  twice  per  second,  subtracted  from 
the  desired  value,  multiplied  by  a normalizing  constant  (see 
table  19.),  summed  into  a root-mean-square  error  score  across 
all  three  parameters  for  the  entire  run  length,  then  divided 
by  the  proportion  of  the  run  completed  as  shown  in  table  20. 

The  resulting  error  score  was  positive  in  value  and  in- 
creased with  poor  performance.  A total  score  of  75  would  indi- 
cate, for  example,  that  all  three  parameters  were  held  at  the 
inner  band  limits  for  the  entire  run  (eg  heading  at  5°,  altitude 
100'  and  airspeed  at  5 kts) . According  to  the  scoring  rationale, 
this  would  represent  one  standard  deviation  performance. 
Similarly,  a total  3core  of  150  would  be  representative  of  a 
run  in  which  all  three  parameters  were  held  at  middle  perfor- 
mance limit,  or  two  standard  deviation  performance. 
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TABLE  17.  ORIGINAL  IFM  PERFORMANCE  BAND  LIMITS 


PARAMETER 

INNER 

MIDDLE 

OUTER 

Heading 

Altitude 

Airspeed 

Vertical  Velocity 
Turn  Rate 
Bank  Angle 

±5° 

±100  1 
±5  Kts 
±250  !/Min 
±0.5VSec 
±2.5°^ 

±10° 

±200  ' 

±10  Kts 
±500 '/Min 
±1 .0  °/Sec 
±5* 

±20° 

±400  ' 

±20  Kts 
±1000 '/Min 
±2.0  /Sec 
±10* 

TABLE  18.  ORIGINAL  IFM  PARAMETERS  SCORED 


| PARAMETER 

MANEUVER 

HEAD- 

ING 

BANK 

ANGLE 

TURN 

RATE 

ALTI- 

TUDE 

RATE 

OF 

CLIMB 

IAS 

Straight  & 
Level 

X 

X 

X 

Climbs  & 
Dives 

X 

X 

X 

Level  Turns 
Fixed  Angle 
Fixed  Rate 

X 

X 

X 

X 

X 

X 

Climbing  and 
Living  Turns 

X 

X 

X 
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TABLE  19.  ORIGINAL  IPM  WEIGHTING  COEFFICIENTS 


PARAMETER 

COEFFICIENT  (K) 

Heading 

5.00 

Altitude 

0.25 

Airspeed 

5.00 

Vertical  Velocity 

0.10 

Turn  Rate 

50.00 

Bank  Angle 

10.00 

TABLE  20.  ORIGINAL  IFM  SCORING  ALGORITHM 


parameter  error  score 
desired  value  of  parameter,  P 
actual  value  of  parameter,  P 
parameter  normalizing  constant 


N 

E <S«J2 


total  score  for  run 

error  score  for  each  of  three  parameters  sampled 
number  of  samples 

proportion  of  run  time  completed  in  seconds, 

ideal  time,  the  time  required  to  complete  a 
perfect  maneuver 


actual  time 
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ORIGINAL  IFM  ADAPTIVE  LOGIC 

Based  on  the  measurement  algorithm,  an  adaptive  logic 
was  developed  to  permit  the  student  to  advance  through  the 
training  course.  The  logic  is  shown  in  table  21.  At  the  end  of 
a given  run,  a student  will  either  advance  one,  two  or  three 
numbered  exercises  in  the  sequence,  stay  the  same,  or  go  back 
one,  two  or  three  exercises  as  a function  of  his  current  per- 
formance and  whether  or  not  he  had  advanced  or  moved  backwards 
in  the  syllabus  on  his  previous  run. 


TABLE  21.  ORIGINAL  IFM  ADAPTIVE  LOGIC 


Previous  Run 
Sequence  Number 
Increment  Status 

st 

>200 

200>St 

>150 

150>St 

>100 

100>St 

>50 

50> 

st 

- (Decrement) 

-3 

-2 

0 

0 

+1 

0 (No  Change) 

-2 

-1 

+ 1 

+1 

+ 2 

+ (Incremented) 

-1 

0 

+ 1 

+ 2 

+ 3 

MAPPING  NEW  MEASURES  INTO  EXISTING  ADAPTIVE  LOGIC 

For  subsequent  measurement  evaluation  purposes  it  was 
necessary  to  have  three  scoring  systems,  (1)  the  original  IFM 
scoring  system,  (2)  a system  based  on  DISCRIM  recommended 
measures  and  weights  and  (3)  a system  based  on  observed, 
normative  IFM  scores.  A method  to  scale  the  second  and  third 
measurement  systems  into  the  adaptive  logic  was  required. 

SCALING  METHOD.  In  the  original  IFM  adaptive  logic  (table  21) , 
the  decision  to  branch  was  made  on  the  basis  of  the  assumed 
distribution  of  the  total  IFM  score,  St?  where: 

St  = 75  was  assumed  to  be  1-sigma  performance,  and 

St  25  150  was  assumed  to  be  2-sigma  performance  for  the 
experienced  naval  aviator. 

Therefore,  branching  decisions  can  be  expressed  as  a function  of 
score  standard  deviations  (z-scores) , as  follows: 


st  - 

50  = 

.667o, 

St  - 

100  = 

1.333o, 

St  - 

150  - 

2.000o  and 

St  ■ 

200  » 

2.667 0. 

O 
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By  computing  z-scores,  the  second  and  third  measurement 
system  scores  can  be  scaled  into  the  existing  adaptive  logic 
without  changing  the  rationale  upon  which  the  adaptive  logic 
was  designed. 

DISCRIM  MEASUREMENT  SCALING.  The  Phase  I data  were  recomputed 
using  the  recommended  measures  and  weights  (tables  11  - 15) . 

On  each  trial,  each  recommended  measure  was  multiplied  by  its 
respective  weight.  A single  score  for  each  trial  was  computed 
by  summing  the  weighted  measures.  This  new  metric  for  each 
trial  was  called  the  total  weighted  eoove  (S^w)  • The  average 
total  weighted  scores  are  shown  in  table  22  along  with  their 
standard  deviations  for  each  maneuver  and  segment. 


For  purposes  of  establishing  a z-score,  criterion  data 
were  drawn  from  Day  7 for  light  aircraft  and  from  Day  6 for 
a heavy  aircraft.  Thus  for  every  trial  of  a given  maneuver 
(and  for  each  segment  within  a maneuver) , a score  would  be 
computed  for  evaluation  by  the  adaptive  logic  as  follows: 


Sz 


where. 


Stw  ~ Stwcm 
^twcs 


Sz  = the  total  score  expressed  as  the  absolute  value  of 
standard  deviations  of  criterion  performance, 


Stw  = the  total  weighted  score  for  each  segment, 

Stwcm  13  the  Stw  mean  performance  on  the  criterion  day, 
Stwcs  = the  Stw  standard  deviation  on  the  criterion  day. 


Where  maneuvers  contained  more  than  one  segment,  the  S2  value 
passed  to  the  adaptive  logic  would  be  the  average  of  all  Sz 
values.  If  any  segment  failed  to  start  or  stop,  Sz  would  be 
set  to  2. 700-sigma  for  that  segment. 


During  system  engineering  tests  it  was  discovered  that 
negatively  weighted  measures  could  cause  misclassification  of 
exceptionally  poor  performance  (such  as  turning  the  wrong  way) . 
In  each  case,  the  poor  performance  was  found  to  be  way  outside 
of  the  meausrement  space  of  the  Phase  I data.  The  maximum 
values  for  all  negatively  weighted  measures  observed  in  the 
Phase  I data  base  are  shown  in  table  23. 


To  guard  against  the  possibility  of  misclassification  by 
the  discriminant  scoring  model,  all  negatively  weighted  measures 
were  first  tested  against  the  limits  in  table  23.  If  on  any 
trial  a negatively  weighted  measure  was  greater  than  its  limit, 
S^w  was  not  computed,  and  a constant  S2  of  2.700  was  returned 
to  the  adaptive  logic. 
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TABLE  22.  AVERAGE  TOTAL  WEIGHTED  SCORES  FOR  USE  IN 

NEW  MEASUREMENT  SYSTEM 


MAN 

LIGHT  A/C 
DAY  1 DAY  7 

HEAVY  A/C 
DAY  2 DAY  6 

1 

MEAN 

3.16813 

1.49352 

3.54622 

2.22700 

S.D. 

.54307 

.54307 

.74903 

.74903 

N 

132 

132 

144 

144 

2 

MEAN 

3.43938 

1.82923 

4.10383 

2.55531 

S.D. 

. 58985 

.58985 

.62989 

.62989 

N 

132 

132 

144 

144 

3. 

MEAN 

3.57394 

2.08675 

4.45655 

3.21209 

S.D. 

.66571 

.66571 

.78041 

.78041 

N 

132 

132 

144 

144 

4-2 

MEAN 

2.87751 

1.46996 

4.69867 

3.23179 

S.D. 

. 70775 

.70775 

.67713 

.67713 

N. 

133 

133 

144 

144 

4-3 

MEAN 

3.19163 

1.66642 

2.1*973 

1.03348 

S.D. 

.64270 

.64270 

.80964 

.80964 

N 

94 

94 

114 

114 

4-4 

MEAN 

4.07226 

2.48889 

3.23810 

2.19262 

S.D. 

.60770 

.60770 

.85043 

.85043 

N 

132 

132 

144 

144 
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TABLE  23.  UPPER  BOUNDS  OF  NEGATIVELY  WEIGHTED  MEASURES 


MANEUVER 

SEGMENT 

MEAS 

LIGHT  A/C 

MEAS 

HEAVY  A/C 

1 

PTRG 

6.780 

AIRG 

2.010 

PSRG 

7.700 

PTSD 

1.607 

HRG 

355.000 

PSRG 

7.136 

ASAA 

13.578 

i 

2 

HDAA 

623.000 

HDAA 

780.000 

ROAA 

5.094 

3 

AIF2 

0.656 

ALRG 

7.259 

ROAA 

13.365 

HAA 

176,000 

HAA 

281.000 

4-2 

ELF1 

0.017 

THRG 

5.435 

ROAA 

31.498 

4-3 

ELF  2 

2.710 

AIF2 

1.337 

BERG 

3.428 

PDF1 

1.136 

4-4 

— 

— 

— 

— 

( 
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NORMATIVE  IFM  MEASUREMENT  SCALING  (NEW  IFM) . Original  IFM  scores 
(St)  were  collected  during  Phase  I.  It  was  observed  that  the 
data  from  the  subject  sample  did  not  agree  with  the  assumed 
performance  norms  (ie  average  performance  was  assumed  to  be 
St=75) . Table  24  suggests  that  the  original  IFM  adaptive  logic 
was  too  lenient  for  straight  and  level  flight  and  too  demanding 
for  climbing  and  diving  turns  based  on  Day  6 and  Day  7 data. 

Since  the  original  IFM  measurement  represented  an  analytically 
specified,  criterion  referenced  measurement  system  based  on 
performance  norms  of  IFM  measurement  for  subseauent  evaluation. 

The  IFM  scores  from  Phase  I had  the  characteristics  of  a 
Poisson  distribution;  the  mean  represented  1-sigma  performance. 
Day  6 and  Day  7 means  for  each  maneuver  of  C.G.  condition  were 
multiplied  by  0.667,  1.333,  2.000,  and  2.667  to  determine  the 
adaptive  logic  decision  values  shown  in  table  25.  From  a 
programming  viewpoint,  it  was  easier  to  replace  the  decision 
values  than  to  compute  2-scores  for  NEW  IFM  scoring.  The 
result  was  equivalent.  Thus  all  three  measurement  systems 
were  scaled  into  the  adaptive  logic  in  an  equivalent  manner. 


TABLE  24.  AVERAGE  IFM  SCORES  FROM  PHASE  I 


MANEUVER 

LIGHT 
DAY  1 

A/C 
DAY  7 

HEAVY 
DAY  2 

A/C 
DAY  6 

STRAIGHT  & 
LEVEL 

69* 

34 

55 

34 

CLIMBS  & 
DIVES 

125 

50 

144 

57 

LEVEL 

TURNS 

146 

65 

121 

68 

CLIMBING  & 
DIVING  TURNS 

221 

94 

206 

120 

!N  - 144 
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TABLE  25.  ADAPTIVE  LOGIC  FOR  ALL  SCORING  SYSTEMS 


SCCr.J  KG 

LOGIC  DECISION  VALUES 

AND 

RESULTING 

SYSTEM 

RUN 

INCREMENT 

' OR 

DECREMENT 

(BELOW) 

I. 

ORIGINAL  IFM 

st> 

200 

>st> 

150 

>st> 

100 

>st> 

50 

>st 

II. 

DISCRIM 

Sz>2. 

667 

V 
CO 

N 

V 

to 

.000>SZ>1 

. 333>SZ> 

0 . 667>SZ 

III. 

NEW  IFM 

St> 

(i) 

1 

>st> 

1 

V 

CO 

rt 

V 

V 

W 

rt 

V 

>st 

Fore  C.G. 

4 

4 

4 

4 

Man.  1 

91 

68 

45 

22 

Man.  2 

133 

100 

67 

33 

Man.  3 

173 

130 

87 

43 

Man.  4 

251 

188 

125 

62 

Aft  C.G. 

• 

Man.  1 

91 

68 

45 

22 

Man.  2 

152 

114 

76 

38 

Man.  3 

181 

136 

91 

45 

Man.  4 

320 

240 

160 

80 

Previous  Run 

Sequence  Status 

- (Decremented) 

-3 

-2 

0 

0 

+1 

0 (No  Change) 

-2 

-1 

+i 

+ 1 

+2 

+ (Incremented) 

-1 

0 

+i 

+ 2 

+ 3 

*The  criterion  of  S*.  shown  below  for  each  maneuver  (man.)  and 
C.G.  condition  to  be  inserted  here. 
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MEASUREMENT  IMPLEMENTATION 

The  IFM  system  computer  programs  were  modified  to  incor- 
porate the  new  measurement  systems  and  to  [ crmit  subsequent 
measurement  evaluations.  Considerable  programming  was  required. 
The  modifications  to  specific  program  modules  are  outlined  in 
Appendix  D.  A summary  of  those  modifications  follows: 

1.  The  instruction  syllabus  was  shortened  as  shown  in  table 
16.  Basically,  an  intermediate  level  of  c.g.  and  all 
turbulence  conditions  were  removed.  Also,  some 
unnecessary  combinations  of  climbing  and  diving  turns 
were  eliminated. 

2.  Real-time  plotting  of  IFM  measure  time  histories  on  the 
I I DOM  was  removed  from  the  program  to  decrease  operating 
complexity  and  increase  storage  space. 

3.  Maneuver  segmentation  for  measurement  purposes  was 
added.  The  segments tion  algorithms  included  the  logical 
operators  and  conditional  test  functions  described  in 
previous  measurement  work. 

4.  The  capability  of  sampling  each  parameter  at  a unique 
rate  was  added. 

5.  Each  measure  was  defined  as  a parameter,  desired  value 
and  transform,  per  previous  work. 

6.  The  Stw  measurement  algorithm  and  limit  tests  were 
added. 

7.  The  Sz  measurement  algorithm  was  added. 

8.  The  NEW  IFM  measurement  algorithm  was  added. 

9.  The  program  was  modified  to  operate  either  according  to 
the  old  IFM,  DISCRIM  or  NEW  IFM  measurement  systems  by 
selecting  sense  switch  options. 

10.  The  performance  summary  line  printer  output  was  modified 
to  include  DISCRIM  measures  in  their  raw  form,  weighted 
measures,  the  sum  of  weighted  measures  (S^.w) , Sz,  the 
criterion  Stw  (where  multiple  segments  exist) . 

11.  A tape  writing  module  was  created  to  output  all  subject 
and  performance  data  on  magnetic  tape  at  the  end  of 
each  trial. 
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SYSTEM  TEST  PROCEDURES 

A system  check-out  was  conducted  to  insure  that  the  program 
was  working  properly,  that  measures  were  being  properly  sampled, 
transformed,  weighted  and  acted- upon  properly,  that  the  maneuver 
segmentation  rules  worked,  that  the  line  printer  output  was 
correct,  and  that  sufficient  foreground  processing  time  resulted. 
This  test  was  not  intended  to  be  any  kind  of  a system  evaluation. 

The  tests  were  conducted  informally  by  checking-out  each 
module  change  as  applicable,  and  by  flying  the  system  with 
each  measurement  system  controlling  training.  Two  test 
trainees  were  used;  they  were  low-time  private  pilots  who  had 
only  light  aircraft  experience.  Testing  with  the  second  trainee 
revealed  the  potential  misclassification  problem  with  the  initial 
DISCRIM  measurement  system  (previously  discussed)  and  brought 
about  solution  to  that  problem. 
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SECTION  V 

MEASUREMENT  SYSTEM  EVALUATION 


The  purpose  of  Phase  III  was  to  conduct  a pilot  study  to 
evaluate  measurement  development  progress  to  date.  The  three 
measurement  techniques  which  resulted  from  Phase  I and  II  were 
evaluated  by  empirical  comparison  of  the  time-to~t.rain  (to 
criterion)  three  groups  of  six  novice  pilots  each  using  the 
original  IFM  (Group  I) , discriminant  (Group  II)  or  normative 
IFM  (Group  III)  scoring  subsystems  in  IFM. 

METHOD 

APPARATUS.  The  TRADEC  and  automated  Instrumented  Flight  Maneuvers 
(IFM)  program  was  made  to  operate  with  three  scoring  subsystem 
described  in  Section  IV. 

TRAINEES.  Fifteen,  17  to  40  year  old,  light  aircraft,  civilian 
pilots  were  used.  An  attempt  was  made  to  restrict  the  pilot 
sample  to  high-time  Student  Pilots  or  low-time  Private  Pilots 
who  had  between  one  and  five  hours  of  instrument  time.  It  was 
expected  that  this  sample  would  approximate  the  population  that 
might  benefit  from  IFM  automated  training. 

MATCHING  GROUPS.  In  addition  to  the  above  criteria,  pilots  were 
divided  into  three  equivalent  groups,  matched  on  two  variables, 
recency  and  first  run  IFM  scores.  Recency  was  calculated  as 
follows:  The  total  of  hours  flown  in  the  last  10  days  plus  hours 

flown  in  the  last  two  months  was  divided  by  10.  The  second 
variable  was  the  first  IFM  trial  score  after  initial  practice. 

It  was  not  possible  to  test  all  pilots  for  group  assignment 
at  one  time  because  of  the  uncertainty  of  volunteer  pilot 
schedules  over  .the  10  weeks  required  to  collect  data.  Matching 
was  done  when  pilots  arrived  for  their  first  session  by  assignment 
to  keep  running  means  of  first  scores  and  recency  as  equivalent 
as  possible.  Of  course,  the  degree s-of -freedom  to  accurately 
match  reduced  as  the  experiment  progressed. 

PROCEDURES.  At  the  first  session  the  test  conductor  briefed  the 
trainee  on  the  purpose  of  the  study,  use  of  the  data,  the  TRADEC 
flight  instruments  and  controls,  the  differences  between  high 
performance  aircraft  and  light  aircraft  and  the  study  procedures. 
Each  pilot  was  given  between  one  and  three  practice  trials  to 
demonstrate  ability  to  control  the  simulator. 

The  pilot  was  then  selected  for  one  of  the  three  scoring 
systems  using  the  matching  method,  and  given  a sequential  subject 
number  within  group  (ie  Subject  3,  Group  2) . All  trainee  data 
and  performance  records  were  indexed  only  by  subject  and  group 
number.  There  was  no  way  to  link  the  data  records  to  a specific 
person  without  knowing  his/her  subject  and  group  number;  the 
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index  between  subject/group  number  and  individuals  was  destroyed 
at  the  end  of  the  study. 

After  group  assignment,  the  trainee  was  placed  under  full 
control  of  the  automated  training  system.  Pilots  were  permitted 
to  fly  45  to  50  minutes  under  control  of  their  assigned  scoring 
system.  On  successive  days,  the  training  syllabus  was  started 
with  the  last  exercise  flown  on  the  previous  day.  Training  con- 
tinued until  the  last  exercise  was  flown  and  a passing  score  was 
achieved. 

MEASUREMENT.  Since  automated  IFM  trained  to  criterion  (as 
expressed  by  the  measurement  and  adaptive  logic  described  in 
Section  IV) , the  dependent  variable  was  the  number  of  trials 
required  or  the  time-to-train , used  interchangeably,  to  complete 
the  course.  Both  IFM  and  S2  scores  (see  Section  IV)  were  avail- 
able to  assess  performance  quality  as  well. 

RESULTS 

MATCHING  GROUPS.  There  were  no  statistically  significant 
differences  between  groups  for  trainee  data  shown  in  table  26. 
Inspection  of  the  distributions  and  trends,  however,  suggested 
that  recency  and  total  flight  time  favored  Group  I.  First  trial 
IFM  scores  favored  Group  III.  Age  favored  Group  II.  sz.  Discrim 
scoring,  was  not  sensitive  for  matching  at  this  initial  stage  of 
training;  many  scores  of  2.700  indicated  that  the  model  measure- 
ment space  was  exceeded  during  initial  matching  runs. 

RAW  RESULTS.  There  were  no  significant  differences  between 
groups  on  the  last  trial  IFM  or  S,  scores;  groups  were  trained 
to  equivalent  performance  levels  (table  27) . The  number  of  trials 
to  achieve  this  performance  was  significantly  different  for 
Group  II,  representing  a 72%  reduction  in  the  time-to-train  over 
Group  I.  Group  III  was  not  significantly  different  from  Group  I 
or  II.  It  was  suspected,  however,  that  these  results  may  have 
been  biased  by  imperfect  group  matching. 

VARIABLES  AFFECTING  GROUP  COMPOSITION.  Correlations  were 
calculated  between  the  variables  shown  in  table  28.  Group 
membership  was  set  to  0 for  Group  I,  to  2 for  Group  II,  and  to  1 
for  Group  III  (in  order  of  performance)  for  correlation  analysis 
purposes.  Group  membership  correlated  with  number  of  trials  with 
an  r=  -.47,  accounting  for  only  22%  of  the  variance  in  the  data. 
The  partial  correlation  between  groups  and  trials,  holding  first 
score  constant  was  rat-  f*  -.51.  The  partial  correlation  between 
groups  and  trials  holding  age  constant  was  r a=  -.19.  Age  and 
first  score  were  biasing  the  data.  g 

A stepwise  multiple  regression  (Heal,  1971)  was  performed 
with  variables  one  through  six  available  as  predictors;  variable 
seven  (trials)  was  the  criterion.  The  stepwise  process  permitted 
only  significant  predictors  to  enter  the  model,  based  on  preset 
F-ratio  criteria.  The  F- level  required  to  enter  or  be  rejected 
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TABLE  26.  TRAINEE  DATA 


GROUP 

RECENCY 

TOTAL1 

TIME 

AGE 

INST* 

TIME 

FIRST  TRIAL  SCORES 
IFM  S2 

I. 

MEAN 

2.47 

327.3 

28.20 

4.9 

150.35 

2.50 

S.D. 

1.24 

627.7 

6.38 

8.5 

91.61 

.44 

II. 

MEAN 

.69 

89.0 

23.80 

2.4 

154.45 

2.70 

S.D. 

.74 

44.8 

9.31 

2.1 

82.84 

.00 

III. 

MEAN 

.96 

97.5 

28.00 

5.9 

112.71 

2.38 

S.D. 

.72 

74.1 

1.41 

7.9 

42.04 

.71 

Total  flight  time  in  hours. 
2Total  instrument  time  in  hours. 


TABLE  27.  RAW  RESULTS 


GROUP 

FIRST 

TRIAL 

LAST  TRIAL 
IFM  S2 

RAW 

TRIALS 

PERCENT 

IMPROVEMENT 

I. 

MEAN 

150.35 

117.90 

1.10 

98.20 

S.D. 

91.61 

58.45 

.74 

48.49 

II. 

MEAN 

154.45 

95.92 

1.07 

56. 801 

72% 

S.D. 

82.84 

27.08 

.22 

28.22 

III. 

MEAN 

112.71 

128.76 

1.35 

62. 202 

57% 

S.D. 

42.04 

68.89 

.41 

21.20 

‘Significant,  Mann-Whitney  U*3,  p=.028 
2Not  Significant. 
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TABLE  28.  SIMPLE  CORRELATIONS  BETWEEN  VARIABLES 


VARIABLE 


1.  RECENCY 

2 . TOTAL  TIME 

3.  AGE 

. INST  TIME 

5.  FIRST  SCORE 

6 . GROUPS 
1 . TRIALS 


r».514  sig. , p*=.05,  two  tailed,  r*.414  for  one  tailed 


1.00 

.56 

.01 

.35 

-.26 

1.00 

-.10 

.62 

-.39 

1.00 

-.04 

-.21 

1.00 

-.27 

1.00 

-.46 

-.28 

-.29 

-.16 

.02 

1.00 

-.04 

-.21 

.68 

-.26 

.34 

-.47 

TABLE  29.  MULTIPLE  REGRESSION  RESULTS 


PREDICTORS  B 

b 

STD 

ERROR 

F 

AGE  .7888 

4.570 

.082 

11. 431 

FIRST  SCORE  . 5011 

.258 

.924 

9 . 87 1 

MULTIPLE  K - .8415,  R2* 
CRITERION  » NO.  TRIALS 
PREDICTION  EQUATION2 

.7082 

TRIALS  = 4.570  AGE 

+ 0.258 

FIRST  SCORE 

- 85.420 

STANDARD  ERROR  - 21 

.755 

Significant,  p<.001,  2/12  df. 
Describes  this  data  base  only 
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from  the  regression  analysis  was  set  to  F=3.59,  which  would 
permit  up  to  three  predictors  at  3/11  df . 

Only  two  predictors  entered  the  multiple  regression,  age  and 
first  score  as  shown  in  table  29.  Age  and  first  score  taken 
together  and  weighted  could  predict  the  number  of  trials  with  a 
standard  error  of  21.76  (trials),  without  regard  to  group 
membership,  and  accounted  for  70%  of  the  variance  in  the  data. 
With  this  result,  the  effects  of  age  and  first  score  could  be 
partitioned,  thereby  statistically  equating  the  groups  on  these 
significant  variables. 

EVALUATION  PESULTS . Age  and  first  trial  effects  were  removed 
from  the  data  by  subtracting  the  number  of  predicted  trials  from 
the  raw  trials  and  forming  a difference  score  (DIFF  in  table  30) . 
The  difference  scores  placed  Group  I performance  15  trials  above 
the  grand  mean,  Group  II  six  trials  below  the  grand  mean  and 
Group  III  nine  trials  below  the  grand  mean.  Both  Groups  II  and 
III  were  significantly  different  from  Group  I. 

The  difference  scores  were  added  to  the  grand  mean  of  trials 
to  form  an  adjusted  number  of  trials  (ADJ  TRIALS  in  table  30) . 
With  the  effects  of  age  and  first  trial  scores  thus  removed, 

Group  II  produced  a 34%  reduction,  and  Group  III  produced  a 40% 
rt  ruction  in  the  time-to-trair  ver  Group  I. 

On  a maneuver  by  maneuver  basis,  Discrim  scoring  held 
trainees  in  straight  and  level  flight  longer  than  either  IFM 
scoring  systems  (table  31) . Discrim  scoring  permitted  trainees 
to  pass  through  climbs  and  dives  and  level  turns  faster  than 
either  IFM  scoring  system,  and  through  climbing  and  diving  turns 
faster  than  Old  IFM  scoring.  Note  that  these  data  were  based  on 
raw  (unadjusted)  trials. 

The  performances  of  three  typical  pilots  who  were  close  to 
their  group  means  are  presented  in  Appendix  E.  These  graphs 
plot  the  progress  of  each  trainee  through  the  syllabus  by  trial. 
They  ^.how  that  Discrim  scoring  tended  to  hold  the  trainee  in  the 
first  exercise  of  straight  and  level  flight  much  longer  than 
either  of  the  two  IFM  scoring  systems.  Also,  both  IFM  scoring 
systems  produced  noticeably  more  instabilities  (oscillations  up 
and  down  the  exercise  list)  than  Discrim  scoring. 

Three  subjects  trained  on  Old  IFM  scoring  volunteered 
comment  that  the  scoring  and  adaptive  logic  seemed  arbitrary  a 
few  times  when  tl.^ir  perceived  performance  did  not  agree  with 
the  automated  judgments.  No  such  comment  was  volunteered  for  the 
other  two  scoring  systems. 

All  subjects  had  problems  with  the  Cognitronics  corrective 
messages  during  early  training.  The  Cognitronics  issued  correc- 
tions when  altitude,  heading,  airspeed,  rate  of  descent  or  bank 
angle  were  out  of  tolerance.  When  multiple  performance  errors 
occurred,  the  corrective  messages  would  "stack-up"  in  a que, 
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TABLE  30.  MEASUREMENT  EVALUATION  RESULTS 


GROUP 

FIRST 

TRIAL 

AGE 

RAW 

TRIALS 

PREDICT 

TRIALS 

DIFF 

ADJ 

TRIALS 

PERCENT 

IMPROV 

I. 

MEAN 

150.35 

28.20 

98.20 

82.29 

15.91 

88.31 

3 

S.D. 

91.61 

6.38 

48.49 

37.81 

12.71 

II. 

MEAN 

154.45 

23.80 

56.80 

63.23 

-6 . 43 1 

65.97 

34% 

S.D. 

82.84 

9.31 

28.22 

39.36 

20.32 

III. 

MEAN 

112.71 

28.00 

62.20 

71.66 

-9. 472 

62.93 

40% 

S.D. 

42.04 

1.41 

21.20 

15.42 

19.11 

GRAND 

MEAN 

72.40 

72.40 

0.00 

72.40 

1 GROUP  II  vs  I,  Mann-Whitney  U=4,  sig,  p=.048. 

2 GROUP  III  vs  I,  Mann-Whitney  U=0,  sig,  p<.001. 

3 Ad justed  trials  = DIFFerence  + TRIAL  GRAND  MEAN. 


TABLE  31. 

NUMBER  OF  RAW 
EACH  MANEUVER 

TRIALS  TO 

COMPLETE 

GROUP 

STRAIGHT 
& LEVEL 

CLIMBS 
& DIVES 

LEVEL 

TURNS 

CLIMBING  AND 
DIVING  TURNS 

I. 

i5(>) 

30 

32 

21 

II. 

22 

16 

9 

9 

III. 

17 

22 

16 

7 

(i) 


Average  No.  trials,  N-5  per  group 
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awaiting  delivery  of  previous  messages.  Often  a coaching 
message  would  occur  after  corrective  action  was  taken,  causing 
the  trainee  to  overcorrect.  Later  in  training  the  error  rates 
were  down  and  the  trainees  learned  to  ignore  the  messages. 

DATA  COLLECTION  NOTES.  Twenty-nine  pilots  volunteered  for  the 
study  in  response  to  notices  given  to  three  general  aviation 
fixed  base  operators  in  the  Orlando,  Florida  area.  Eight 
volunteers  were  ruled-out  because  of  very  high  total  flight  or 
instrument  hours.  Two  potential  trainees  were  excused  after 
matching  because  they  could  not  be  assigned  a Group  without 
significantly  unbalancing  recency  or  first  scores.  One  trainee 
started  but  did  not  finish  due  to  continued  conflict  with  his 
work  schedule.  When  data  collection  was  finished,  there  were 
6 subjects  in  each  group  (N=18)  . 

Three  trainees  over  40  years  old  were  omitted  (the  oldest 
from  each  group)  during  preliminary  analyses  because  (1)  the  age 
effect  was  more  pronounced  than  anticipated,  (2)  they  were 
outside  the  expected  age  range  of  potential  automated  IFM  system 
users,  (3)  they  were  outside  the  age  range  of  the  Phase  I data 
from  which  both  Discrim  and  Normative  IFM  measurement  "models” 
were  derived,  (4)  their  outlying  performance  introduced  an 
unprecedented  amount  of  variance  in  the  data,  and  (5)  in  one  case, 
the  trainee  did  not  appear  to  be  very  adaptable  to  automated 
training  techniques  as  configured  in  IFM. 

Data  collection  required  10  weeks,  scheduling  an  average 
four  hours  of  system  time  each  day  for  an  average  of  five  days 
a week  (M,T,T,F,S).  About  one  hour  a day  (or  one  day  a week) 
was  lost  due  to  trainee  no-show,  trainee  late  or  system  mal- 
function (in  order  of  decreasing  occurrence). 

DISCUSSION 

The  results  offered  encouraging  evidence  that  empirical  me- 
thods can  improve  upon  analytically  derived  measurement  and  cause 
a substantial  increase  in  the  efficiency  of  training.  Flight 
simulators  are  scheduled  heavily  in  the  field.  Present  and  future 
systems  can  be  expected  to  be  burdened  with  even  higher  utiliza- 
tion due  to  more  training  required  by  more  complex  systems,  tasks 
and  pressures  to  conserve  fuel.  A 40%  increase  in  training 
efficiency  would  have  a substantial  impact  in  field  training. 

AGE.  Subject  age  was  a more  powerful  influencer  of  complex 
psychomotor  training  performance  than  the  measurement  systems, 
where  the  range  of  age  in  the  sample  was  between  17  and  40  years. 
Although  we  did  not  need  to  perform  a study  to  learn  that,  we  had 
to  be  certain  that  age  (and  other  variables)  were  not  biasing  the 
data  in  favor  of  one  measurement  system  over  another.  The  use  of 
the  prediction  equation  removed  the  bias,  and  was  conservative 
because  there  was  correlation  (table  28)  between  age  and  groups 
(ie  some  of  the  group  effect  was  removed  by  the  procedure) . 
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The  magnitude  of  the  age  effect  suggested  that  it  should 
be  included  in  any  future  application  of  the  discriminant 
analysis  measure  selection  technique.  It  should  become  one  of 
the  candidate  measures  along  with  other  student  history  variables 
as  well.  Future  studies  of  this  type  should  match  groups  on  age 
and  first  score. 

VALUE  OF  NORMS.  The  time-to-train  improvement  for  the  normative 
IFM  measurement  group  suggested  the  efficiencies  that  can  be 
obtained  by  simply  collecting  empirical  data  and  adjusting 
analytically  derived  measurement  according  to  performance  norms. 
This  has  implication  for  retro-fit  or  situations  where  the 
discriminant  (or  other  multivariate)  techniques  are  not  feasible. 

DISCRIM  PARTIALLY  VALIDATED.  The  discriminant  model  developed 
in  Phase  I generalized  to  a new  sample  of  pilots  who  had  58%  more 
total  flying  hours,  54%  less  instrument  time  and  who  were  16% 
older.  It  also  trained  as  well  as  the  "criterion  referenced" 
normative  IFM  measures.  This  suggests  some  validity  in  the 
model  as  a whole,  which  included  (1)  removal  of  the  components 
of  variance  to  create  "independent"  samples,  (2)  the  use  of  the 
multiple  discriminant  model  for  measure  selection,  and  (3)  the 
"ridge"  method  to  stabilize  the  weights. 

TOWARD  MORE  COMPREHENSIVE  MEASUREMENT.  The  discriminant  model 
did  not  perform  any  better  overall  than  the  normative  IFM  model. 
Discrim,  however,  has  advantages  that  may  lead  to  an  improvement 
in  efficiency  beyond  normative  criterion  referenced  models.  The 
principle  advantage  is  that  DISCRIM  SELECT  can  accept  non-system 
performance  measures  and  properly  weight  and  evaluate  them  in  a 
set  that  contains  also  system  performance  measures. 

For  example,  if  pilot  age  had  been  included  in  the  Phase  I 
candidate  measure  set,  it  probably  would  have  emerged  as  a 
recommended  measure  (based  on  Phase  III  results) . If  it  had, 
the  evaluation  results  would  probably  have  been  closer  to  fhe 
raw  results  (table  27)  than  the  adjusted  results  because  one  of 
the  measure  groups  would  have  been  sensitive  to  the  age  effect, 
and  would  have  absorbed  some  of  the  age  effect  variance.  There 
are  undoubtedly  several  student  history  variables  that  are  just 
as  important  to  performance  assessment  as  the  system  performance 
measures . 

PILOTING  TECHNIQUE.  The  discriminant  model  essentially  described 
a trained  person  in  multidimensional  space,  which  included 
control  input  measures  as  well  as  outer-loop  (ie  heading, 
altitude  and  airspeed)  measures.  It  is  possible  that  Discrim 
scoring  was  sensitive  to  pilot  control  technique  as  well  as 
overall  system  performance. 

REINFORCEMENT.  Discrim  scoring  was  not  sensitive  to  performance 
differences  during  matching  and  held  trainees  in  the  very  first 
straight  and  level  exercise  for  a long  time.  Decisions  made  on 
Discrim  scoring  required  that  the  trainee  start  performing  like 
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a trained  person  in  all  dimensions  (including  control  input) 
before  it  would  permit  any  advancement.  Pilots  so  trained  did 
• not  receive  any  positive  reinforcement  (advancement)  until  they 
developed  sufficient  technique  and  performance.  Once  that 
happened/  they  progressed  rapidly  through  the  syllabus. 

In  contrast,  both  IFM  scoring  systems  were  less  demanding; 
only  outer- loop  measures  had  to  be  within  bounds.  IFM  scoring 
systems  may  have  permitted  advancement  prior  to  the  development 
of  good  basic  piloting  technique.  Trainees  may  have  been 
incorrectly  rewarded  by  advancement  (note  comments  by  subjects 
that  IFM  seemed  arbitrary  at  times)  and  were  still  trying  to 
discover  proper  technique  while  encountering  new  tasks.  This 
could  have  caused  the  instabilities  that  were  seen  in  both  IFM 
scoring  systems. 

SINGLE  SCORE  MEASUREMENT.  We  are  not  convinced  that  adaptive 
logics  which  require  movement  through  a syllabus  based  on  a 
single  score  produce  the  most  efficient  training.  Performance 
is  multidimensional,  and  measurement  can  be  made  to  diagnose 
at  least  major  problems.  For  example,  if  a student  during  a 
climbing  turn  has  problems  controlling  the  turn,  that  problem 
is  easily  measured.  Diagnosis  of  the  problem  and  subsequent 
action  by  the  adaptive  logic  might  produce  more  efficiencies 
and  perhaps  better  training. 

MEASUREMENT  RELATIVE  TO  STUDENT  EXPERIENCE.  Early  in  training 
a student  may  not  need  to  perform  within  2-sigma  of  end  of  course 
criteria.  If  a student  is  within  the  performance  range  of  other 
students  with  his  experience  (and  chose  norms  converge  on  end  of 
course  criteria) , then  the  student  is  performing  as  expected, 
and  should  be  permitted  to  advance.  Adaptive  logics  can  be 
designed  to  make  judgments  based  on  such  norms.  When  the  system 
is  first  installed  it  can  start  operation  with  assumed  norms 
that  can  be  programmed  to  adjust  after  sufficient  data  are  accrued. 

COMMENTS  ON  AUTOMATED  TRAINING  SYSTEM  DESIGN.  Although  the 
purpose  of  our  work  was  to  develop  and  evaluate  measurement, 
several  comments  on  the  design  of  automated  training  systems  can 
be  made  on  the  basis  of  the  training  problems  that  were  observed. 
These  comments  might  be  helpful  to  designers  of  next  generation 
systems,  and  are  contained  in  Appendix  F. 
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SECTION  VI 
CONCLUSIONS 
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The  purpose  of  the  program  was  to  develop  improved  measure 
selection  techniques,  implement  the  results  of  those  techniques 
in  an  automated  flight  training  system  and  evaluate  resulting 
measurement.  All  program  objectives  were  achieved.  The  conclu- 
sions of  each  phase  are  presented  in  the  following: 

MEASURE  SELECTION 

Noting  that  task  analytic  methods  often  produce  an  abundance 
of  measurement,  empirical  techniques  were  explored  to  reduce 
analytically  derived  measurement  to  a smaller,  more  manageable 
set  that  would  be  sensitive  to  the  change  in  performance  during 
training.  The  major  conclusions  of  the  measure  selection  method 
development  work  were: 

1.  It  is  necessary  to  perform  a good  analysis  of  each 
maneuver  to  specify  candidate  measurement  based  on 
operational  requirements  and  the  research  literature. 

2.  Candidate  measures  should  be  specified  in  terms  of  the 
parameters  to  be  sampled,  the  rates  at  which  they  are 
sampled,  their  desired  values  (if  any)  and  the 

trans  formation . 

3.  Extreme  care  is  necessary  in  the  specification  of 
unambiguous  rules  for  starting  and  stopping  measurement. 

4.  It  is  necessary  to  conduct  measurement  selection  empir- 
ical studies  to  collect  data  on  the  candidate  measures 
during  training  for  subsequent  measure  selection  analyses. 

5.  Testing  means  of  individual  measures  for  significant 
changes  between  early  and  late  in  training  reduces  meas- 
urement; however,  this  method  does  not  consider  the 
complexity  of  performance,  the  inter-relations  between 
measures  and  does  not  provide  a method  to  weight  measures 
for  the  construction  of  an  overall  score. 

6.  Eliminating  highly  correlated  measures  is  an  effective 
method  to  reduce  redundant  information,  serves  as  a 
first  step  filter  and  permits  the  analyst  a little 
latitude  to  specify  extra  measures  in  selected  areas  of 
uncertainty;  also,  it  is  necessary  if  multivariate 
analyses  are  to  be  used. 

7.  Canonical  correlation  analyses  are  effective  for 
selecting  those  measures  out  of  a battery  that  predict 
later  measures;  however,  the  method  (a)  often  produced 
asymmetrical  predictive  and  criterion  sets,  (b)  was 
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difficult  to  interpret  and  reduce  to  an  algorithm  re- 
quired for  mapping  measurement  into  an  adaptive  logic, 
and  (c)  was  omitted  from  further  development  at  this 
time.  However,  it  may  be  useful  for  diagnosis  and  pre- 
scription of  performance  in  more  complex,  or  branching 
adaptive  logics. 

8.  The  multiple  discriminant  analysis  can  be  modified  to 
form  an  effective  technique  for  selecting  and  weighting 
those  measures  which  best  discriminate  between  early  and 
later  training;  however,  in  order  to  use  this  method,  it 
is  necessary  to: 

a.  collect  data  on  ail  major  tasks  and  variations  to 
those  tasks  (such  as  center-of-gravity  change, 
turbulence,  etc.)  both  early  and  late  in  training, 

b.  remove  highly  correlated  measures, 

c.  have  a minimum  of  5 to  7 times  as  many  observations 
as  variables  (candidate  measures) , 

d.  correct  statistically  for  repeated  observations  on  the 
same  trainees  (if  repeated  observations  were  taken) , 

e.  specify  the  minimum  communality  of  any  measure  and 
the  minimum  number  of  measures  (in  terms  of  percent 
variance  of  the  smallest  factor) , 

f.  stabilize  the  beta  weights  using  modified  "ridge” 
analysis  techniques  for  more  reliable  prediction. 

9 . The  methods  used  to  partition  the  variance  due  to  re- 
peated observations  and  to  stabilize  tne  weighting 
coefficients  should  be  further  studies  along  with 
methods  to  reduce  sampling  requirements  for  more  effi- 
cient data  collection. 

10.  The  measures  and  weighting  coefficients  that  emerge 
from  the  modified  multiple  discriminant  analysis  can 

be  used  to  form  a single  score,  the  discriminant  function, 
for  use  by  adaptive  logics  that  require  a single  score. 

11.  Control  input  measures  were  often  important  in  describing 
the  differences  between  skilled  and  unskilled  performance. 

MEASURE  IMPLEMENTATION 

The  recommended  weights  and  measures  which  resulted  from  the 
multiple  discriminant  analysis  were  mapped  into  the  automated 
training  system  (IFM) , forming  a second  measurement  subsystem. 

A third  measurement  subsystem  was  created  by  modifying  the 
adaptive  logic  to  operate  on  norms  of  the  original  IFM  measures, 
based  on  data  acquired  during  measure  selection  studies.  The 
major  conclusions  of  the  implementation  effort  were: 
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1.  Means  and  standard  deviations  of  the  discriminant 
function  must  be  computed  in  measurement  space  (DISCRIM 
SELECT  output  is  in  discriminant  space)  t.o  determine 
criterion  performance. 

2.  The  discriminant  model  can  misclassify  poor  performance 
if  that  poor  performance  is  on  a negatively  weighted 
measure  that  has  a magnitude  outside  the  measurement 
space  of  the  original  data  (used  to  produce  the 
discriminant  function)  . 

3.  Misclassif ication  is  easily  circumvented  by  a 
heirarchical  algorithm  which  first  tests  negatively 
weighted  measures  to  insure  that  they  are  within 
4-sigma  of  their  average  in  the  original  data.  If  the 
unweighted  measure  fails  the  test,  the  discriminant 
function  is  set  to  2.7-sigma.  If  the  measure  passes 
the  test,  the  discriminant  function  is  computed. 

4.  A rational  way  to  scale  different  measurement  system 
outputs  into  the  adaptive  logic  is  through  z-scores  of 
criterion  performance. 

5.  Real-time  programming  of  the  measures,  weights,  start 
and  stop  rules,  and  heirarchal  model  was  achieved  in 
the  TRADEC/IFM  within  the  50  millisecond  program  cycle 
time;  measurement  included  control  input  power 
approximation  in  the  frequency  domain  through  the  use  of 
digital  high  and  low-pass  filters,  sampling  20  times  per 
second. 

MEASURE  EVALUATION 

Empirically  derived  measurement  systems  were  substituted  in 
an  existing  automated  instrument  flight  maneuvers  training  system 
with  the  result  that  time  to  train  to  the  same  criterion  was 
reduced  34-40%.  It  was  concluded  that: 

1.  If  this  result  holds  in  subsequent  validation,  the  users 
of  advanced  and  retro-fitted  training  systems  (that 
contain  measurement  improved  by  empirical  techniques) 
can  look  forward  to  improved  efficiency  and  utilization 
of  those  devices. 

2.  In  existing  automated  training  devices  that  have 
measurement,  these  levels  of  increased  efficiency  should 
result  by  modifying  the  adaptive  logic  to  operate  on 
actual  performance  norms  rather  than  assumed  norms  in 
their  scoring  algorithms. 

3.  The  approach  taken  in  the  development  of  the  modified 
multiple  discriminant  analysis  for  selecting  measures 
(DISCRIM  SELECT)  was  partially  validated. 
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4.  The  discriminant  model  measurement  appeared  to  be 
sensitive  to  piloting  technique  and  to  provide  more 
reliable  performance  feedback. 

5.  The  discriminant  model  can  be  expected  to  produce 
better  measurement  in  future  efforts  than  was  shown  in 
the  evaluation  because  it  can  select  and  properly 
weight  (along  with  system  measures)  student  history 
variables  (such  as  age)  which  have  been  shown  to  be 
very  important. 

6.  The  measure  production  and  selection  techniques  herein 
described  have  produced  improvement  to  analytically 
derived  measurement  of  a sufficient  magnitude  to 
warrant  application  of  these  techniques  with  the  end 
goal  of  specifying  measurement  for  future  and  existing 
flight  training  systems.  In  order  to  apply  the 
techniques,  data  collection  in  field  training 
environments  is  required. 

Although  the  purpose  of  the  program  was  to  develop  and 
evaluate  measurement,  several  conclusions  concerning  the  design 
of  automated  training  systems  are  related  to  measurement  and  can 
be  made  from  the  data: 

1.  Linear,  single  score  adaptive  logics  similar  to  the 
configuration  of  IFM  may  not  be  efficient  enough  to  use 
in  operational  training.  The  interaction  between  the 
syllabus  exercises,  adaptive  logic  and  measurement  does 
not  always  permit  the  good  trainee  to  advance  rapidly. 
Marked  improvement  should  result  by: 

a.  Limiting  the  number  of  exercises  within  a maneuver 
to  only  those  that  have  operational  relevance. 

b.  Removing  exercises  from  the  main  line  sequence  (or 
removing  them  altogether)  that  only  provide  task 
variation  or  stressors  such  as  turbulence. 

c.  Strongly  inhibit,  or  remove  altogether,  backward 
movement  through  the  syllabus. 

d.  Construct  the  score  on  the  basis  of  performance 
norms . 

2.  Adaptive  logics  which  require  a single  score  do  not  take 
advantage  of  the  power  of  measurement  to  diagnose 
performance  and  lead  to  better  prescription  of  training. 
Branching  logics  based  on  more  than  one  measure  should 
be  more  efficient. 

3.  It  may  not  be  necessary  to  expect  a student  to  perform 
within  2-sigma  of  end  of  training  criteria  in  all  cases. 
The  measurement  system  should  be  designed  to  evaluate 
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performance  against  norms  based  on  time  in  training. 
Assumed  norms  can  be  used  until  sufficient  data  accrues 
to  change  them. 
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SECTION  VII 


RECOMMENDATIONS 


It  is  jre commended  that: 

1.  The  techniques  described  herein  be  improved  and  used  to 
produce  and  select  measurement  for  existing  and  future 
automated  flight  training  systems. 

2.  An  operational  flight  training  site  (or  sites)  be  selected 
for  performance  data  collection  in  a military  flight 
training  simulator  environment.  Subsequent  analyses  of  the 
data  should  lead  eventually  to  a specification  for  measure- 
ment for  the  maneuvers  and  class  of  aircraft  tested. 

3.  Initial  field  measurement  activities  be  limited  to 
instrument  flight  or  weapon  delivery  phases  of  simulator 
training  where  initial  conditions  and  prescribed  flight 
paths  are  known  and  specifiable;  however,  it  is  possible 
and  recommended  that  other  flight  regimes  (where  criteria 

can  be  specified)  be  explored  for  measurement  possibility. 

/ 

4.  The  results  of  initial  field  studies  (ie:  recommended  mea- 

sures and  weights)  should  be  installed  in  the  field  systems, 

and  an  evaluation  of  the  new  measurement  should  be  conducted  } 

to  determine  the  training  impact  (similar  to  the  Phase  III 
evaluation  reported  herein) . 

5.  Continued  improvement  to  DISCRIM  SELECT  be  undertaken  by 
incorporating  other  nonsystem  performance  measures  such  as 
age,  time  in  training,  and  student  history,  and  by  further 
research  with  the  existing  data  base. 

6.  Consideration  be  given  to  add  to  the  Phase  I data  base  some 
early  trials  with  turbulence  and  turbulence  in  combination 
with  aft  center-of-gravity , so  that  measures  for  those  task 
stressors  can  be  produced. 

7.  Statistical  issues  brought  about  by  the  use  of  multivariate 
methods  for  measure  selection  be  further  studied;  these 
issues  include,  but  are  not  limited  to  (a)  methods  to 
partition  the  variance  due  to  repeated  measures,  (b)  methods 
to  stabilize  the  weighting  coefficients  and  (c)  methods  to 
possibly  reduce  sampling  requirements. 

8.  Existing  and  future  single  score,  linear  adaptive  logics  be 
limited  as  described  herein,  and  that  scoring  be  based  on 
performance  norms  throughout  training.  Future  systems 
should  contain  performance  data  files  that  make  the 

conversion  from  initially  assumed  norms  to  actual  norms  j 

convenient,  and  changes  to  the  scoring  algorithms  possible 
without  reprogramming. 
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9.  Future  automated  training  systems  be  designed  with  branching 
(or  at  least  lateral)  logics  that  make  decisions  on  more 
V than  one  performance  score,  and  that  the  construction  and 

weighting  of  those  scores  be  readily  ameanable  to  change 
without  reprogramming. 

10.  IFM  be  modified,  and  a study  conducted  to  determine  the 
efficacy  of  (a)  a limited  linear  adaptive  logic  (as  in 
Conclusions) , (b)  a lateral  logic  which  permits  graduation 
from  task  variation  trials  to  the  next  maneuver,  and  (c)  a 
limited  criterion  test,  branching  logic.  Since  the 
mechanisms  are  all  in  place,  minimum  resource  expenditures 
could  provide  substantial  guidance  for  future  system  designers. 


( 


( 
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APPENDIX  A 

RAW  DATA  AND  MEASUREMENT  FUNCTIONS  AND  TRANSFORMS 
AVAILABLE  IN  CURRENT  MEASUREMENT  PROGRAMS 

TABLE  32.  REAL  TIME  RAW  DATA  PARAMETERS  FROM  SIMULATOR 


PARAMETER 

UNITS 

ABBREVIATION 

1. 

SYSTEM  CLOCK  COUNT 

CLOK 

2. 

ELEVATOR  STICK  FORCE 

POUNDS 

ELVF 

3. 

ELEVATOR  STICK  DISPLACEMENT 

INCHES 

ELVS 

4. 

ANGLE  OF  ATTACK 

UNITS 

ALPH 

5. 

PITCH  ATTITUDE 

DEGREES 

PTCH 

6. 

CLIMB/DESCENT  RATE 

FEET  PER  MINUTE 

HDOT 

7. 

ALTITUDE 

FEET 

ALT 

8. 

RIGHT  THROTTLE  DISPLACEMENT 

DEGREES 

THRR 

9. 

AIRSPEED 

KNOTS 

A/S 

10. 

AILERON  STICK  FORCE 

POUNDS 

AILF 

11. 

AILERON  STICK  DISPLACEMENT 

INCHES 

AILS 

12. 

ROLL  ATTITUDE 

degress 

ROLL 

13. 

TURN  RATE 

DEGREES  PER  SECOND 

TURN 

14. 

HEADING 

DEGREES 

HEAD 

15. 

RUDDER  PEDAL  FORCE 

POUNDb 

RUDF 

16. 

RUDDER  PEDAL  DISPLACEMENT 

INCHES 

PED 

17. 

SIDESLIP 

DLGREES 

BETA 

18. 

TURBULENT  AIR  INTENSITY 

ARBITRARY  UNITS 

RUFF 

NAVTRAEUU I PCEN  74-C-0063-1 
TABLE  33  . GLOSSARY  OP  START/STOP  FUNCTIONS  1 


MNEMONIC  FUNCTION 

START/STOP  WHEN: 

B 

Beginning  of  Record 

E 

End  of  Record 

P 

End,  Best  Fit  Power  of  2 

G 

PAR  5DSR  ' 

Parameter  Greater  than  Desired  Value 

L 

PAR<DSP. 

Parameter  Less  them  Desired  Value 

0 

|PAR-DSR  | >TOL 

Absolute  value  of  parameter  minus 
desired  value  is  greater  than  (outside 
of)  tolerance 

I 

|PAR-DSR  | <TCL 

Absolute  value  of  parameter  minus  desired 
value  is  less  than  (inside)  tolerance 

CO 

|PAR-INIT  |>TOL 

Absolute  value  of  parameter  minus  its 
initial  value  is  greater  than  tolerable 
( or  the  change  from  initial  is  outside* 
of  tolerance) 

Cl 

| PAR-INIT  j <TOL 

Absolute  value  of  parameter  minus  its 
initial  value  is  less  than  the  tolerance 

TABLE  34  . GLOSSARY  OF  LOGICAL  OPERATORS  FOR 
COMBINING  START/STOP  FUNCTIONS1 


MNEMONIC  EACH  PAIR  OF  FUNCTIONS  (F)  IS  EVALUATED  TRUE  IF: 

A Fj  is  True  and  F2  is  True 

0 F2  is  True  or  F2  is  True 

N Fx  is  True  and  F2  is  False 

r Fj  is  False  and  F2  is  False 

* These  logical  and  relational  expressions  could  be  expanded 
as  necessary. 
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TABLE  35  . GLOSSARY  OP  TRANSFORMATIONS 


MNEMONIC 

TRANSFORMATION 

INIT 

Initial  Scalar  Value 

FINL 

Final  Scalar  Value 

AIN1 

Absolute  Initial  Scalar  Value 

AFIN 

Absolute  Final  Scalar  Value 

MIN 

Minimum  Value 

MAX 

AVG 

Maximum  Value 

1 l 

Average  Value  N * 

X 

AAE 

l n 

Average  Absolute  Value  i I 

N i 

|x| 

ERS 

i * 

Error  Squared  Value  ^ £ 

n , n 

Variance  £ x2-  jjj  (£x) 2 

X2 

VAR 

RMS 

n i / 

Root-Mean-Square  (1  £ x2 ) ' 2 

N 1 

SDV 

Standard  Deviation  1 

S=T  (? 

* n , V* 

x2-  1 (Ex)2  ) 

S'  1 

TOT 

Time  Out  of  Tolerance  in  Seconds  and  Tenths 

RNG 

Range,  Distance  Between  the 
Smallest  value 

Largest  and 

ELT 

Elapsed  Time  in  Seconds  and 

Tenths 

ZRX 

No.  Zero  Crossings  per  Second 

AVX 

No.  Average  Crossings  per  Second 

AUTO 

Auto  Covariance  Function 
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TABLE  35  . GLOSSARY  OF  TRANSFORMATIONS  (continued) 


MNEMONIC 

TRANSFORMATION 

PERD 

Periodicity  of  Auto  Covariance  Function,  the 
tau  shift  values  and  covariance  at  peaks. 

MLTR 

Multiple  Regression  of  a Parameter  x and  its 
derivative  (x)  on  Parameter  y (Cooley  and 
Lohnes,  1962).  This  particular  transform 
computes  successive  multiple  regressions  of 

x,  x on  later  (tau)  Values  of  y,  (as  in  an 
auto  covariance  funct.:£i)  until  maximum 
multiple  regression  coefficient  is  found. 
It  returns  (1)  Tau  in  seconds,  (2)  the 
coefficient  of  multiple  regression  (3) 
the  Beta  weights  and  (4)  B-weights  at  the 
point  of  maximum  multiple  regression. 

HARM 

Harmonic  Analysis  using  procedures  outlined 
Blackman  and  Tukey  (1959) , Cooley  and  Tukey 
(1965)  and  Villasenor  (1968)  produced 
the  power  spectral  density  function  for  the 
requested  bandwidth. 

FLTR 

Relative  power  between  2 and  6 radians 
per- second  using  a pair  of  low-pass 
second-order  digital  filters  as  described 
by  Norman  (1973) . 

'■  ) 
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AVERAGE  MANEUVER  1 {STRAIGHT  & LEVEL)  MEASURES 


Measure  significantly  different  than  Day  6 (vs  Days  2 and  4)  and  Day 
(vs  Remaining  days),  p<.05  based  on  t-tests?  142  D/F. 


TABLE  37  . AVERAGE  MANEUVER  2 (CLIMBS  & DESCENTf  ■ MEASURES 
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TABLE  39.  AVERAGE  MANEUVER  4,  SEGMENT  2 (INITIAL  CLIMB/DIVE  TURN)  MEASURES 


TABLE  40.  AVERAGE  MANEUVER  4,  SEGMENT  3 ( CLIMB/DIVE  6 TURN  REVERSAL)  MEASURES 
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CANDIDATE 

MEASURES 


ELRG 
ELFl 
ELF  2 
AIRG 
AIFl 
AIF2 
PDRG 
PDF1 
PDF2 
ALRG 
ALSD 
PTRG 
PTSD 
ROAA 
RORM 
PSRM 
PSRG 
HAA 
HRG 
HDAA 
HDRG 
ASAA 
ASRG 


DAY  1 VS 
DAY  7 


DAY  2 VS 
DAY  6 


DAY  7 VS 
DAY  8 


DAY  7 vs 
DAY  9 


* Chains  of  measures  which  in ter correlate,  r>.90, 
each  comparison  day.  ~ 
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TABLE  44 . MANEUVER  3 (LEVEL  TURNS)  EQUIVALENT  MEASURES 


CANDIDATE 

MEASURES 


DAY  1 VS 
DAY  7 


DAY  2 vs 
DAY  6 


DAY  7 VS 
DAY  8 


DAY  7 VS 
DAY  9 


ELF1 
ELF  2 
ALRG 
ALSD 
PTSD 
AIP1 
AIF2 
ROAA 
RORM 
PDFl 
PDF2 
BERG 
BERM 
ASAA 
ASRM 
HAA 
THRG 


M 

n 


M 


M 


n 


n 

n 


* Chains  of  measures  which  intercorrelate,  x>  .90,  for  each 
comparison  day.  ~ 
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TABLE  45.  MANEUVER  4,  SEGMENT  2 (INITIAL  CLIMB/DIVE  TURN) 

EQUIVALENT  MEASURES 


CANDIDATE 

MEASURES 


DAY  1 vs 
DAY  7 


DAY  2 VS 
DAY  6 


DAY  7 VS 
DAY  8 


DAY  7 vs 
DAY  9 


ELF1 
ELF  2 
ALRG 
HDAA 
THRG 
ASAA 
AIF1 
AIF2 
BERM 
ROAA 
PDF1 
PDF  2 
PSAF 
TIME 


TABLE  46.  MANEUVER  4,  SEGMENT  4 (FINAL  CLIMB/DIVE  TURN) 

EQUIVALENT  MEASURES 


DAY  1 vs 
DAY  7 


DAY  2 vs 
DAY  6 


DAY  7 vs 
DAY  8 


DAY  7 vs 
DAY  9 


CANDIDATE 

MEASURES 


ELF1 
ELF  2 
ALRG 
HDAA 
THRG 
ASAA 
AIF1 
AIF2 
BERM 
ROAA 
PDF1 
PDF  2 
PSAF 
TIME 


* Chains  of  measures  which  intercorrelatc,  r j>  .90,  for  each 
comparison  day. 
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TABLE  47.  MANEUVER  4,  SEGMENT  3 (CLIMB/DIVE  TURN  REVERSAL) 

EQUIVALENT  MEASURES 


CANDIDATE 

DAY 

1 VS 

DAY 

2 vs 

DAY 

7 vs 

DAY 

7 vs 

MEASURES 

DAY 

7 

DAY 

6 

DAY 

8 

DAY 

9 

ELF1 
ELF  2 
ALRG 
HDAF 
AIFl 
AIF2 
BERG 
ROAF 
PDFl 
PDF  2 
TIME 


* Chains  of  measures  which  intercorrelate,  r >_  . 90/  for  each 

cosg>arison  day. 

* 

i 

i 

I 


! 

i 


( 


99 


NAVTRAEQUIPCEN  74-C-0063-1 


APPENDIX  D 

IFM  Program  Modifications  to  Incorporate  Performance 
Measurement  Techniques 


The  following  program  changes  were  made  to  the  Instrument 
Flight  Maneuvers  program  to  incorporate  real-time  performance 
measurement.  Modifications  are  listed  by  module  name  whenever 
a new  module  was  added,  an  old  module  deleted  or  the  initial 
module  was  altered. 

1.  ATE  System  Parameters  (APAM) . This  data  module  was  changed 
to  reflect: 

a.  The  deletion  of  the  IDIIOM  graphics  display  buffers  and 
associated  parameters. 

b.  The  deletion  of  data  not  specifically  required  for  the 
performance  measurement  update. 

c.  The  addition  of  data  and  parameters  needed  to  support 
the  performance  measurement  update. 

d.  Modifications  to  the  Task  Description  Table  Definition 
List  to  tailor  the  tasks  to  the  IFM-PM  syllabus. 

e.  The  addition  of  the  magnetic  tape  buffer  and  the  as- 
sociated data  parameters  necessary  to  support  the 
magnetic  tape  output  records. 

f.  The  inclusion  of  the  parameters  and  allied  data  required 
to  implement  the  three  (3)  scoring  modes: 

(1)  Original  IFM  (with  turbulence  removed) 

(2)  Original  IFM  modified  to  utilize  Normative  Data 

(3)  Discriminate  Analysis 

g.  The  revision  and  update  of  line  printer  messages,  scoring 
tables,  adaptive  logic  constants,  boundary  limits,  etc., 
necessary  tc  support  the  scoring  modes  and  magnetic  tape 
module . 

h.  The  revision  of  the  Difficulty  Level  tables  to  remove 
turbulence  as  a difficulty  factor  from  the  IFM  runs. 

2.  Task  Description  Parameters  (TDP) . This  data  module  was 
changed  to  reflect: 

a.  The  deletion  of  tasks  not  required  for  the  IFM--PM  update. 
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b.  A change  to  the  Task  Description  Table  format  to  permit 
the  addition  of  Segment  Description  Tables. 

c.  The  incorporation  of  Segment  Description  Tables  (SDT's) 
which  provide  the  program  the  following  data  for  each 
maneuver,  configuration  and  parameter. 

(1)  Rate  at  which  parameter  is  sampled. 

(2)  A pointer  to  the  Start  Measurement  Conditions  (SMC) 
table . 

(3)  A pointer  to  the  Stop  Measurement  Conditions  (TMC) 
table. 

(4)  A pointer  to  the  parameter  to  be  measured. 

(5)  A pointer  to  the  desired  v-lue  of  the  parameter. 

(5)  A list  of  the  transforms  to  be  performed  on  the 
parameter . 

d.  The  incorporation  of  Start/Stop  Measurement  Conditions 
Tables  (SMC* s/TMC' s) . These  tables  describe  under  what 
conditions  the  measurement  of  each  parameter  listed  in 
the  SDT's  is  to  be  started  and  terminated.  These  tables 
contain: 

(1)  The  status  of  the  parameter. 

(2)  A pointer  to  the  parameter  to  be  tested. 

(3)  The  function  (i.e.,  greater  than,  equal  to,  etc.,  to 
some  desired  value) . 

(4)  A pointer  to  the  desired  ’ nlue  of  the  parameter. 

(5)  A tolerance  for  the  desired  value. 


(6)  A conditional  which  generates  another  set  of  items 

(1)  through  (5)  above.  Examples  of  conditionals  are: 
Logical  OR,  Logical  AND,  Sequential  AND. 

The  incorporation  of  the  following  real-time  tables  and  , 

buffers  to  support  the  performance  measurement  functions: 

(1)  Segment  Rate  Table  - reflects  the  rate  at  which  each 
parameter  is  to  be  sampled. 

(2)  Segment  Description  Table  - a pointer  which  corres- 
ponds to  each  item  in  the  SRT  pointing  to  the 
appropriate  buffer  in  the  Event  Segment  Table  (EST) . 


101 


NAVTRAEQUIPCEN  74-00063-1 


i 


i 

i 


(3)  Event  Segment  Table  (EST)  - A real-time  buffer 
containing  data  for  all  the  parameters  to  be  sampled 
for  the  current  event.  It  is  compiled  from  data 
supplied  by  the  SDT's  for  the  segmentr  required.  It 
contains  the  following  information: 

(a)  Sampling  rate. 

(b)  The  SMC/TMC  index. 

(c)  A pointer  to  the  parameter  being  sampled. 

(d)  A pointer  to  the  desired  value. 

(e)  A list  of  transforms  to  be  performed  on  the 
parameter. 

(f)  A pointer  to  a collection  buffer  assigned  each 
transform. 

(4)  Start  Measurement /Terminate  Measurement  Table 
pointers.  These  are  tables  which  point  to  the 
appropriate  S tart/S top  Measurement  Conditions  Tables. 
An  index  to  these  tables  is  placed  in  (3) (b)  above. 

(5)  Collection  Buffers  - These  buffers  are  used  by  each 
parameter  transform  to  collect  data  in  real-time. 
Their  individual  length  is  dependent  upon  the  type 
of  transform  (amount  of  data  required  for  the 
transform. 

f.  The  addition  of  a table  which  specifies  which  parameters 
are  available  for  output  to  magnetic  tape. 

3.  ATE  Modifications  (AMOD) . The  emergency  procedures  were 
deleted  £rom  this  module. 

4.  AFT  Modifications  (AFTM) . No  changes. 

5.  ATE  Executive  Routines  (ATEX) . The  average  rate  of  climb  and 
rate  of  turn  computations  were  removed  from  foreground 
processing  and  placed  in  the  background  program  Parameter 
Update  (PMUP ) . A routine  needed  to  convert  turn  rate  from 
radians  per  second  to  degrees  per  second  was  added. 


6.  Trim  Aircraft  (TRCZ) . No  changes. 

7.  Pseudo-Hearing  (PSH) . No  changes. 

8.  Timing  Control  (TIMR) . The  graphics  display  timer  was 
removed. 

9.  PM  SDT  Processor  (SDT) . This  was  a new  module  added  to  the 
list  of  foreground  processors . This  routine  interrogated  the 
Segment  Rate  Tables  (SRT)  and  if  time  to  sample,  it  fetches 
the  appropriate  parameter,  performs  the  specified  transforms 
and  places  the  intermediate  results  in  the  collection  buffers. 
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10.  Tnput/Output  (IQ).  The  following  changes  were  made: 

I a.  Deletions  to  eliminate  the  residual  GCA  and  IDIIOM 

i display  functions. 

b.  The  addition  of  a H$WE0D"  teletype  input  command  which 
outputs  an  end-of-file  record  to  the  magnetic  tap >. 

1 1 • Parameter  Update  (PUP) . A new  background  module  to  compute 
heading,  bank  angle,  roll  rate  and  pitch  angle  which  was 
previously  accomplished  in  the  foregound  mode. 

» 

12.  Exercise  Scheduler  (EXSC) . Eliminated  GCA  and  Emergency 
Procedures  routing.  Added  Coding  required  to  save  data 
needed  for  PMDP  routine. 

13.  Exercise  Terminator  (EXTR) . Eliminated  GCA  and  Emergency 
Procedures  Routing  and  thv?  logic  to  terminate  the  session 
automatically . 

14.  Post  Run  Router  (PRR) . Eliminated  GCA  and  Emergency  Proce- 
dures Routing. 

1 15.  IFM  Initialize  (IFIN) . Eliminated  DR$3  bypass  routine. 

16.  IFM  Preflight  Check  (PREF) . No  changes. 

I 

j 17.  Controlled  Take-Off  (CTO) . No  changes. 

18.  Control  to  Basic  IFM  Configuration  (CIFC) . No  changes. 

! 19.  IFM  Task  Selector  (IFTS) . This  module  was  modified  to 

reflect  the  following: 

a.  Eliminated  the  graphics  display  set-up. 

b.  Incorporated  the  provision  for  processing  the  SDT's  and 

I setting  up  the  appropriate  real-time  tables  and  buffers 

for  the  selected  measuring  segment. 

c.  Add'd  the  option  for  a "Leg  Complete"  cognitronics 

j message  on  designated  legs. 

d.  Provided  for  following  discrete  lamps  in  the  event  of 
cognitronics  failure. 

(1)  Take  Control. 

(2)  Place  Speed  Brake  In. 

(3)  Leg  Complete. 

* (4)  Stop  Controlling  Aircraft, 

I 

I 
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(5)  Good  Run. 

e.  Incorporated  a subroutine  STPROC  which  is  called  from 
the  GP.M  module.  The  purpose  of  this  subroutine  is  to 
process  the  SMT/TMT  tables  and  test  the  associated 
SMC/TMC's  to  determine  if  measurement  or*  the  correspond- 
ing segment  is  to  start  or  terminate. 

f.  Provided  coding  to  allocate  storage  for  savirg  the 
Absolute  Average  Errors  generated  for  Heading,  Roll  and 
Turn  Rate  in  the  original  IFM  program. 

g.  Initialization  of  the  magnetic  tape  output  buffer 
(MTBUFFER) . 

?J.  General  Performance  Monitor  (GPM1 . This  module  was  modified 

to  incorporate  the  following  changes : 

a.  Provide  a computation  of  the  absolute  heading  and 
altitude  differences. 

b.  Open  outer  limits  on  all  parameters  to  prevent  the  run 
from  premature  termination. 

c.  Provide  linkages  for  the  Performance  Measurement  real- 
time ir.  jdules . 


21. 

22. 


d.  Compute  and  save  the  Absolute  Average  errors  for  Heading, 
Roll  and  Turn  Rate  for  IFM  magnetic  tape  output. 

IFM  Display  List  Update  (IDII).  This  module  was  deleted  for 
the  Performance  Measurement  program. 

IFM  Data  Processing  (IDP) . The  following  changes  were  made 
to  this  module: 


The  ability  to  read  the  console  sense  switches  was 
incorporated.  Sense  switches  incorporated  and  their 
meanings  are: 

Switch  # 


1 

2 

3 


Meaning 

Use  IFM  Original  scoring 
Use  Discriminate  scoring 
Use  IFM  Normative  Data  scoring 


b.  Provide  linkage  for  PM  data  processing  module. 

c.  Provide  maneuver  and  scoring  data  for  the  magnetic  tape 
output  buffer  (MTBUFFER) . Sort,  process  and  store  all 
parameters,  transforms,  student  file  data,  etc., 
collected  by  the  PMDP  module  for  end-of-run  output  to 
magnetic  tape. 
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d.  Provides  the  linkage  to  call  the  magnetic  t *pe  output 
routine  (MTOUTtl)  at  the  completion  of  each  run. 

23.  Performance  Measurement  Data  Processing  (PMDP) . This  was  a 
new  module  addecf  to  provide  for  line  printer  and  magnetic 
tape  output  of  the  performance  measurement  parameters.  For 
each  segment  it  lists: 

a . The  maneuver . 

b.  The  parameters  measured  along  with: 

(1)  The  desirec  value  of  the  parameter. 

(2)  The  transforms  performed. 

(3)  The  raw  value  of  the  transform. 

(4)  The  weighting  factor  of  the  transform. 

(5)  The  weighted  value  of  the  transform. 

c.  The  mean  and  standard  deviation  of  the  total  sample 
score. 

d.  The  weighted  score  and  adjusted  weighted  score  for  each 
segment  and  the  total  exereiae. 

e.  The  scorimg  mode  (IFM  Original,  IFM  Normative  or  Dis- 
criminate) . 

f.  The  adaptive  logic  injrewnent  selected  (dependent  upon 
the  scoring  raodo) . 

The  following  features  were  also  incorporated: 

a.  For  Discriminate  scoring  an  upper  1 imit  was  placed  on 
parameters  which  have  negative  weighting  factors.  If 
this  limit  was  exceeded  by  the  raw  measure  value, 
maximum  adjusted  weighted  score  of  2.7  was  used. 

b.  The  set-up  of  the  magnetic  tape  buffer  (MTBUFFER)  for 
segment  dependent  parameters  (pointers,  weights,  raw 
values,  weighted  ■ alues,  means,  standard  deviations, 
etc. ) . 

c.  A subroutine  ( PMTRAN ) that  transfers  all  performance 
measurement  data  generated  in  the  foreground  SDT:1 
module  to  a working  buffer  to  be  processed  by  the  PMDP 
background  module. 

24.  IFM  Adaptive  T,ogic  ( IAL)  . This  module  was  modified  to  permit 
the  adaptive  logic  to  operate  on  the  original  IFM  score,  the 
IFM  Normatic  score  or  the  Discriminate  score  depending  upon 
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the  setting  of  the  console  sense  switches  (see  2?  above) . 

25.  PH  Parameters  (PPAM) . ^his  is  a new  module  which  contains 
the  WIND  subroutine  and  PM  associated  data  tables.  The 
WIND  subroutine  locates  the  appropriate  weighting  factor 
as  specified  by  the  maneuver,  configuration,  parameter  and 
transform.  WFTAB  is  a table  of  weighting  factors  for  these 
factors.  MCTAB  is  a table  of  means  and  standard  deviations 
for  the  maneuver  and  configuration. 

26.  AFT  Subroutines  (ASUB) . This  module  was  expanded  to  include 
a 'floating  point  to  fixed  number  conversion  (FIX),  a 
hexadecimal  to  ASCII  conversion  (KEXASC)  and  an  EBCDIC 
number  to  hexadecimal  (BCDTOHEX) * 

27.  Cognitronic3  Message  Processor  (COG).  This  module  was 
altered  to  permit  bypassing  ot  the  cognitronics  output  in 
the  event  of  a hardware  failure. 

2 8 . Convert  Floating  Point  to  Cognitronics  Addresses  (CADR) . 

No  change. 

29.  Data  Recording  (DREC) ♦ This  is  a new  nodule  that  outputs 
th'ii  magnetic  tape  buffer  (MTBUFFER)  as  one  physical  record 
to  alternate  tape  units  80  and  81. 
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APPENDIX  E 

TYPICAL  TRAINING  PROFILES 


Training  profiles  of  typical  students  with  each  of  the 
three  measurement  subsystems  are  shown  in  figures  2 - 4,  on 
the  following  pages. 
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Figure  3.  Typical  Group  II  (Oiscrirc)  Subject  Perf^r 
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Figure  4.  Typical  Croup  III  (NORM  IFM)  Subject  Performance 
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APPENDIX  F 

i v. 

T COMMENTS  ON  AUTOMATED  TRAINING  SYSTEM  DESIGN 


Although  the  purpose  of  our  work  was  to  develop  and  evaluate 
measurement,  several  comments  on  the  design  of  automated  training 
systems  can  be  made  on  the  basis  of  the  training  problems  that 
were  observed.  These  comments  might  be  helpful  to  designers  of 
next  generation  systems. 

IFM  was  designed  under  some  inherent  constraints  that  may 
have  prevented  it  from  being  a good  instructor.  For  example, 
when  a pilot  needed  help  most  during  early  training,  the 
coaching  messages  lagged  too  far  behind  his  performance  to  be  of 
value.  There  was  no  priority  system  to  evaluate  and  correct  the 
most  important  errors  first.  Neither  were  there  any  judgments 
about  the  reasonability  of  performance  taking  subject  experience 
into  account.  Also,  it  was  not  possible  to  construct  voice 
coaching  on  piloting  technique  and  finesse  to  the  extent  that  a 
good  instructor  would. 

The  syllabus  and  adaptive  logic  design  of  IFM  may  not  lead 
to  efficient  training.  Recall  the  system  design  (Section  IV) . 

The  system  required  *-he  student  to  master  each  exercise  to  end 
of  training  proficiency  levels  before  advancement  to  the  next 
exercise.  There  were  many  exercises  within  a maneuver  that  were 
composed  of  task  variations,  ordered  with  increasing  "dif ficulty • H 
The  adaptive  logic  only  moved  the  trainee  up  or  down  this  list 
of  exercises. 

This  kind  of  adaptive  logic  produced  at  least  two  problems 
related  to  inefficiency.  First,  when  a student  encountered  an 
exercise  that  was  difficult  and  performed  poorly,  the  adaptive 
logic  often  set  him  back  to  an  exercise  he  had  already  passed. 

But,  because  it  had  set  him  back,  he  often  had  to  perform  several 
trials  on  exercises  he  had  already  passed  before  he  could  try 
again  the  problem  exercise.  Secondly,  there  were  too  many 
exercises  contained  within  each  maneuver,  tending  to  force  the 
good  pilot  to  perform  unnecessary  trials. 

We  are  not  convinced  that  adaptive  logics  that  require 
movement  through  a syllabus  based  on  a single  score  produce  the 
most  efficient  training.  Performance  is  multidimensional,  and 
measurement  can  be  made  to  diagnose  at  least  major  problems. 

For  example,  if  a student  during  climbing  and  diving  turns  has 
problems  controlling  the  turn,  that  problem  is  easily  measured, 
and  the  logic  might  branch  the  student  to  a level  turn  exercise 
to  at  least  check  his  ability  to  handle  level  turns. 

The  net  result  of  this  adaptive  logic,  which  we  shall  call 
linear  single  score,  is  that  it  will  very  likely  lead  to 
automated  training  systems  that  increase  the  time  required  to 
train  in  operational  settings  over  the  more  traditional  methods. 
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This  kind  of  result  would  be  very  unfortunate  because  the 
problem  is  not  the  concept  of  automated  training,  but  the  proper 
use  of  automated  systems  and  the  design  of  adaptive  logics, 
syllabus  exercises  and  measurement  systems. 

Certainly  the  cost  and  utility  of  producing  very  smart 
automated  systems  should  be  strongly  considered  during  system 
definition.  To  make  a system  as  smart  as  a good  instructor  for 
early  in  training  might  be  very  complex  indeed.  Once  a student 
has  an  initial  grasp  of  technique,  however,  automated  systems 
might  have  good  utility  for  presenting  a variety  of  problems  for 
practice,  skill  improvement,  test  administration  and  performance 
assessment.  This  utilization  might  represent  a reasonable  cost 
trade  in  terms  of  system  complexity,  would  unburden  the 
instructor  of  the  routine,  so  that  he  could  concentrate  on  early 
training  and  student  problems,  and  would  provide  a convenient 
system  for  performance  measurement  and  assessment. 

Where  there  are  special  technique  problems,  such  as  learning 
the  proper  skill  to  control  vehicles  in  unstable  regimes, 
separate  subsystems  may  be  designed  to  specifically  address  the 
teaching  of  technique  alone.  Continuous  adaptation  of  vehicle 
characteristics  may  have  application  in  this  area. 

Improved  performance  of  linear  single  score  adaptive  logic 
might  result  if  backward  movement  through  the  syllabus  was 
inhibited  or  eliminated  altogether,  and  if  the  number  of 
exercises  within  a maneuver  were  limited.  Syllabus  construction 
requires  a great  deal  of  care  and  operational  input.  The 
composition  of  exercises  should  be  related  to  tasks  which  must 
be  trained.  The  addition  of  exercises  which  create  only  task 
variation  should  not  slow  down  training;  these  exercises  might 
be  considered  "lateral"  to  the  main  line,  and  successful 
performance  on  them  should  cause  graduation  to  the  next  "main 
line"  exercise. 

Adaptive  logics  can  be  constructed  to  make  judgments  based 
on  performance  norms  relative  to  the  student's  time  in  training 
or  experience  level.  Early  in  training  a student  may  not  need  to 
perform  within  2-sigma  of  end  of  course  criteria.  If  the  student 
is  within  the  performance  range  of  other  students  of  his  experi- 
ence (and  those  norms  converge  on  terminal  criteria) , then  the 
student  is  performing  as  expected,  and  should  advance.  Systems 
can  be  designed  to  start  operation  with  assumed  norms  that  can  be 
adjusted  after  sufficient  data  are  accrued. 

Branching  logics  may  have  utility  where  performance  is 
expressed  by  more  than  one  score.  For  example,  a small  set  of 
criterion  test  exercises  can  be  constructed.  Failure  to  pass 
those  exercises  would  result  in  branching  to  either  task  variation 
exercises  or  remediation  exercises.  If  successfully  passed, 
remediation  exercises  should  point  to  the  last  attempted  criterion 
exercise,  but  task  variation  exercises  should  point  to  the  next 
criterion  exercise. 
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